Coronavirus COVID-19 Cases in Finland

Bernardo Di Chiara, Data Analyst

http://fi.linkedin.com/in/bernardodichiara

Last full updates to the comments: June 16th 2020

Last plotted day: see the end of this file

Table of Contents

1. Executive Summary
....1.1. References
2. Setup
3. Defining the Needed Functions
....3.1. Dataframes and Lists Handling
....3.2. Plots
....3.3. Project-specific Functions
4. Dumping and Collecting the Data
5. Data Analysis
....5.1. Summary
....5.2. Preliminary Data Analysis
....5.3. Data Cleansing
....5.4. Data Preparation
............5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries
............5.4.2. Population age data
............5.4.3. World Data
............5.4.4. Finnish Data
............5.4.5. Data from other Scandinavian Countries and Estonia
............5.4.6. Data from other European Countries
............5.4.7. Data from UK and US
............5.4.8. Data from Brazil, Russia and India
............5.4.9. Data from China
....5.5. Summary of the Created Datasets
6. Domain-Specific Concepts
7. Data Visualization
....7.1. Overview
............7.1.1. General Comments to the Plots
............7.1.2. A Reference Curve Set
....7.2. Finnish Internal Situation
....7.3. Comparison with the Closest Neighboring Countries
....7.3.1. Comparison with Other Scandinavian Countries and Estonia
....7.4. Comparison with other European Countries
....7.5. UK and US
....7.6. Brazil, Russia and India
....7.7. Normalizing by Country population
............7.7.1. List of Variables Affecting Potentially the Curves
............7.7.2. Confirmed Cases: Summary of Findings from the Analysis
............7.7.3. Deceased Cases: Summary of Findings from the Analysis
....7.8. Demographic Considerations
....7.9. Normalizing by Country Population and Population Density
....7.10. Situation in China
....7.11. Situation in Italy
....7.12. World View
............7.12.1. Lethality
8. Statistics
....8.1. World view
....8.2. Top Ten Countries
....8.3. Finland
9. Conclusions
10. Acknowledgements

1. Executive Summary

This notebook contains visualizations related to the spread of the Coronavirus COVID-19 with a focus on Finland.

The data is taken from the Johns Hokpins University (JHU) /1/.

There are a few good dashboards in the Web about to this topic (for example, by Johns Hokpins University /2/ and by Tableau /3/). In addition, there is a good site with latest information about Finland broken down by Region /4/. Another very useful source of information is the European Centre for Disease Prevention and Control /5/. Still, it might be beneficial to manipulate the data in order, for example, to compare Finnish curves with curves from other Countries.

Having updated charts is very useful both for authorities and for the population in order to make fact-based decisions that help to contain the positive cases so not to overload the hospitals and therefore minimizing the casualties.

Comparing Finnish curves to those of neighboring Countries might provide useful insights since, in addition to the geographical proximity and similar weather, those Countries have certain similarities in culture, behavior patterns and may be genetics.

Sections from 2 to 5 contain mostly code which is needed to define the used functions and to dump, cleanse and prepare the data.

General domain specific concepts are contained in section 6. An overview chapter containing a description of the plots and the illustration of a reference case is contained at the beginning of section 7.

Line plots containing confirmed cases each day as well as recovered and deceased cases have been produced. The active cases have been shown in the same plot.

Other plots containing the new confirmed daily cases, which shows the speed at which the virus is spreading, have been added as well. Daily increments have been plotted also for the deceased and the active cases.

Finnish curves have been compared to the curves of the other Scandinavian Countries as well as few other European Countries. Curves of UK, US, Brazil, Russia and India have been plotted as well.

Plots showing the number of confirmed cases per capita have been created to eliminate the population variable from the comparisons. Other plots have been created to normalize by the density of the population.

Finally, plots with worldwide data have been produced. This includes also a couple of plots that try to put the number of deceased cases into context.

Bar plots containing data of the most affected Countries have been added.

Due to the criticality of this information, no recommendations are included in this paper. Currently, Doctors and Authorities are the best sources for such recommendations.

If you are not interested in the code, go to section 6 and onward and focus on the plots, the tables and the plain text.

DISCLAIMER:

  • The code has not been peer-reviewed. If someone is wishing to do it, please contact the author.
  • The data related to the last day might be incomplete.
  • See also the legal disclaimer.

The spread of virus follows the rules of mathematics and statistics (Dr. Katharina Hauck, https://www.imperial.ac.uk/people/k.hauck).

1.1. References

/1/ [GitHub Repository by Johns Hokpins University](https://github.com/CSSEGISandData/COVID-19)
https://github.com/CSSEGISandData/COVID-19

/2/ [Dashboard by Johns Hokpins University with world-wide view](https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

/3/ Dashboard by Tableau with both global and Country-specific data
https://public.tableau.com/profile/covid.19.data.resource.hub#!/vizhome/COVID-19Cases_15840488375320/COVID-19Cases

/4/ Latest news about Finland broken by Region
https://finland-coronavirus-map.netlify.com/

/5/ European Centre for Disease Prevention and Control
https://www.ecdc.europa.eu/en/novel-coronavirus-china

/6/ Coursera: Let's Talk About COVID-19
https://www.coursera.org/learn/covid-19/home/welcome

2. Setup

In [1]:
# Importing the needed packages
import os
import datetime as dt
import regex as re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Displaying all the dafaframe columns
pd.set_option('display.max_columns', None)

# Setting a time stamp
start_time = dt.datetime.utcnow()

3. Defining the Needed Functions

3.1. Dataframes and Lists Handling

In [2]:
def df_basic_data(dfname):
    '''
    This function prints basic information about a given dataframe.
    The function needs as input parameters the dataframe name.
    '''

    import pandas as pd

    # Fetching the dataframe name
    name = [x for x in globals() if globals()[x] is dfname][0]
    print("Dataframe name:", name, "\n")
    print("Dataframe length:", len(dfname), "\n")
    print("Number of columns:", len(dfname.columns), "\n")
    # Columns data types
    data_types = dfname.dtypes
    # Distint values
    distint_values = dfname.apply(pd.Series.nunique)
    # Amount of null values
    null_values = dfname.isnull().sum()
    print("Dataframe's columns names, column data types, amount of distint "
          "(non null) values\n"
          "and amount of null values for each column:")
    df_index = ['Data_Type',
                'Amount_of_Distint_Values',
                'Amount_of_Null_Values']
    col_types_dist_null = pd.DataFrame([data_types,
                                        distint_values,
                                        null_values],
                                       index=df_index)
    return col_types_dist_null.transpose()
In [3]:
def calc_increments(listname):
    '''
    This function:
    takes a list,
    calculates the delta between each element and its predecessor,
    returns the result in a new list having the same lenght as the original list
    '''

    # Initializing an empty list of floats to contain the increments
    increments = []
    # Adding zero to the first element
    increments.append(0.0)
    # Looping through all the occurrencies except the first one
    for i in list(range(1, len(listname))):
        # Calculating the increment
        delta = listname[i]-listname[i-1]
        # Adding the result to the list
        increments.append(delta)
    # Returning the result
    return increments
In [4]:
def find_neg_increm(listname):
    '''
    This function:
    takes a list,
    calculates the delta between each element and its predecessor,
    checks if the increment is negative and
    returns the result in a new list with boolean values having the same lenght as the original list
    '''

    # Initializing an empty list of floats to contain the increments
    neg_increments = []
    # Adding zero to the first element
    neg_increments.append(0)
    # Looping through all the occurrencies except the first one
    for i in list(range(1, len(listname))):
        # Calculating the increment
        delta = listname[i]-listname[i-1]
        # Checking if the increment is negative
        if delta < 0:
            neg_increments.append(1)
        else:
            neg_increments.append(0)
    # Returning the result
    return neg_increments

3.2. Plots

In [5]:
def cust_line_plot(*parameters,
                   figsize_w=8, figsize_h=6,
                   title=None,
                   title_fs=16, title_offset=20,
                   rem_borders=False,
                   label_fs=12, tick_fs=6, 
                   x_label=None,
                   rot=0,
                   y_label=None,
                   legend=False, leg_fs=10, legend_loc=0,
                   first_line_x=None, first_line_col=7,
                   first_line_ls=':', first_line_x_l=None,
                   second_line_x=None, second_line_col=7,
                   second_line_ls='--', second_line_x_l=None,
                   third_line_x=None, third_line_col=7,
                   third_line_ls='-.', third_line_x_l=None,
                   fourth_line_x=None, fourth_line_col=7,
                   fourth_line_ls='-', fourth_line_x_l=None,
                   fifth_line_x=None, fifth_line_col=8,
                   fifth_line_ls=':', fifth_line_x_l=None,
                   sixth_line_x=None, sixth_line_col=8,
                   sixth_line_ls='--', sixth_line_x_l=None,
                   seventh_line_x=None, seventh_line_col=8,
                   seventh_line_ls=':', seventh_line_x_l=None,
                   eighth_line_x=None, eighth_line_col=8,
                   eighth_line_ls='-', eighth_line_x_l=None):
    """
    This function plots a scatterplot for the provided data
    and customizes the way the chart looks by using the value of
    the provided parameters.

    Keyword arguments:
    parameters       -- A (mandatory) tuple of 5 elements containing:
                        a list with the x values,
                        a list with the y values,
                        a string containing the selected marker,
                        a string containing the selected line style,
                        an integer (from 0 to 9) selecting the seaborn-deep
                        color,
                        a string containing the text for the legend
    figsize_w        -- The width of the plot area
    figsize_w        -- The height of the plot area
    title            -- A string containing the title of the chart
    title_fs         -- The title font size
    title_offset     -- Distance between the title and the top of the chart
    rem_borders      -- If True the top and right borders are removed
                        (default: False)
    label_fs         -- x and y axis labels' font size
    tick_fs          -- The tick values font size
    x_label          -- Label for the x-axis (string)
    rot              -- The rotation angle of the tick values
    y_label          -- Label for the y-axis (string)
    legend           -- A boolean variable that tells if to plot a legend
    leg_fs           -- Font size for the legend
    legend_loc       -- An integer from 0 to 9 controlling the legend location
    first_line_x
    ...
    eighth_line_x    -- x coordinates of vertical lines
    first_line_col
    ...
    eighth_line_col -- an integer (from 0 to 9) selecting the seaborn-deep
                        color of the corresponding line    
    first_line_x_l
    ...
    eighth_line_x_l -- legend text for the corresponding lines
    """

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=title_offset)
    # Removing the top and right borders if so defined
    if rem_borders is True:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # Initializing an empy list to contain the legend text
    leg_text_l = []
    for param in parameters:
        # Extracting the values given in parameters
        x = param[0]
        y = param[1]
        mark = param[2]
        ls = param[3]
        col_numb = param[4]
        leg_text = param[5]
        # Appending the string to the list
        leg_text_l.append(leg_text)
        # Creating the scatter plots
        plot = plt.plot(x, y, marker=mark, linestyle=ls, color=color_list[col_numb])

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    plt.xticks(fontsize=tick_fs, rotation=rot)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)
    plt.yticks(fontsize=tick_fs)

    # Adding vertical lines
    if first_line_x:
        plt.axvline(x=first_line_x, color=color_list[first_line_col],
                    linestyle=first_line_ls)
        leg_text_l.append(first_line_x_l)
    if second_line_x:
        plt.axvline(x=second_line_x, color=color_list[second_line_col],
                    linestyle=second_line_ls)
        leg_text_l.append(second_line_x_l)
    if third_line_x:        
        plt.axvline(x=third_line_x, color=color_list[third_line_col],
                    linestyle=third_line_ls)
        leg_text_l.append(third_line_x_l)
    if fourth_line_x:        
        plt.axvline(x=fourth_line_x, color=color_list[fourth_line_col],
                    linestyle=fourth_line_ls)
        leg_text_l.append(fourth_line_x_l)
    if fifth_line_x:        
        plt.axvline(x=fifth_line_x, color=color_list[fifth_line_col],
                    linestyle=fifth_line_ls)
        leg_text_l.append(fifth_line_x_l)
    if sixth_line_x:        
        plt.axvline(x=sixth_line_x, color=color_list[sixth_line_col], 
                    linestyle=sixth_line_ls)
        leg_text_l.append(sixth_line_x_l)
    if seventh_line_x:        
        plt.axvline(x=seventh_line_x, color=color_list[seventh_line_col], 
                    linestyle=seventh_line_ls)
        leg_text_l.append(seventh_line_x_l)
    if eighth_line_x:        
        plt.axvline(x=eighth_line_x, color=color_list[eighth_line_col], 
                    linestyle=eighth_line_ls)
        leg_text_l.append(eighth_line_x_l) 
    
    # Adding a legend
    if legend:
        plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Showing the plot without additional text
    plt.show()
In [6]:
def cust_bar_plot(parameters,
                  figsize_w=8, figsize_h=6,
                  title=None, title_fs=16, title_offset=20,
                  rem_borders=False,
                  label_fs=12, tick_fs=6,
                  x_label=None,
                  rot=0,
                  y_label=None,
                  legend=False,
                  leg_fs=10,
                  legend_loc=0,
                  first_line_x=None, first_line_col=7,
                  first_line_ls=':', first_line_x_l=None,
                  second_line_x=None, second_line_col=7,
                  second_line_ls='--', second_line_x_l=None,
                  third_line_x=None, third_line_col=7,
                  third_line_ls='-.', third_line_x_l=None,
                  fourth_line_x=None, fourth_line_col=7,
                  fourth_line_ls='-', fourth_line_x_l=None,
                  fifth_line_x=None, fifth_line_col=8,
                  fifth_line_ls=':', fifth_line_x_l=None,
                  sixth_line_x=None, sixth_line_col=8,
                  sixth_line_ls='--', sixth_line_x_l=None,
                  seventh_line_x=None, seventh_line_col=8,
                  seventh_line_ls=':', seventh_line_x_l=None,
                  eighth_line_x=None, eighth_line_col=8,
                  eighth_line_ls='-', eighth_line_x_l=None,                  
                  first_line_y=None, first_line_y_l=None,
                  second_line_y=None, second_line_y_l=None,
                  third_line_y=None, third_line_y_l=None,
                  fourth_line_y=None, fourth_line_y_l=None):
    """
    This function plots a bar plot for the provided data
    and customizes the way the chart looks by using the value of
    the provided parameters.

    Keyword arguments:
    parameters       -- A (mandatory) tuple of 4 elements containing:
                        a list with the x values,
                        a list with the y values,
                        an integer (from 0 to 9) selecting the seaborn-deep
                        color,
                        a string containing the text for the legend
    figsize_w        -- The width of the plot area
    figsize_w        -- The height of the plot area
    title            -- A string containing the title of the chart
    title_fs         -- The title font size
    title_offset     -- Distance between the title and the top of the chart
    rem_borders      -- If True the top and right borders are removed
                        (default: False)
    label_fs         -- x and y axis labels' font size
    tick_fs          -- The tick values font size
    x_label          -- Label for the x-axis (string)
    rot              -- The rotation angle of the tick values
    y_label          -- Label for the y-axis (string)
    legend           -- A boolean variable that tells if to plot a legend
    leg_fs           -- Font size for the legend
    legend_loc       -- An integer from 0 to 9 controlling the legend location
    first_line_x
    second_line_x
    third_line_x     
    fourth_line_x
    fifth_line_x     
    sixth_line_x     -- x coordinates of vertical lines
    first_line_col
    second_line_col
    third_line_col     
    fourth_line_col
    fifth_line_col
    sixth_line_col   -- an integer (from 0 to 9) selecting the seaborn-deep
                        color of the corresponding line    
    first_line_x_l
    ...
    eighth_line_x_l   -- legend text for the corresponding lines
    first_line_y
    second_line_y
    third_line_y     
    fourth_line_y    -- y coordinates of horizontal lines
    first_line_y_l
    second_line_y_l
    third_line_y_l   
    fourth_line_y_l  -- legend text for the corresponding lines
    """

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=title_offset)
    # Removing the top and right borders if so defined
    if rem_borders is True:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # Initializing an empy list to contain the legend text
    leg_text_l = []
    # Extracting the values given in parameters
    x = parameters[0]
    y = parameters[1]
    col_numb = parameters[2]
    leg_text = parameters[3]

    # Creating the bar plot
    plot = plt.bar(x, y, color=color_list[col_numb])

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    plt.xticks(fontsize=tick_fs, rotation=rot)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)
    plt.yticks(fontsize=tick_fs)

    # Adding vertical lines
    if first_line_x:
        plt.axvline(x=first_line_x, color=color_list[first_line_col],
                    linestyle=first_line_ls)
        leg_text_l.append(first_line_x_l)
    if second_line_x:
        plt.axvline(x=second_line_x, color=color_list[second_line_col],
                    linestyle=second_line_ls)
        leg_text_l.append(second_line_x_l)
    if third_line_x:        
        plt.axvline(x=third_line_x, color=color_list[third_line_col],
                    linestyle=third_line_ls)
        leg_text_l.append(third_line_x_l)
    if fourth_line_x:        
        plt.axvline(x=fourth_line_x, color=color_list[fourth_line_col],
                    linestyle=fourth_line_ls)
        leg_text_l.append(fourth_line_x_l)
    if fifth_line_x:        
        plt.axvline(x=fifth_line_x, color=color_list[fifth_line_col],
                    linestyle=fifth_line_ls)
        leg_text_l.append(fifth_line_x_l)
    if sixth_line_x:        
        plt.axvline(x=sixth_line_x, color=color_list[sixth_line_col], 
                    linestyle=sixth_line_ls)
        leg_text_l.append(sixth_line_x_l)
    if seventh_line_x:        
        plt.axvline(x=seventh_line_x, color=color_list[seventh_line_col], 
                    linestyle=seventh_line_ls)
        leg_text_l.append(seventh_line_x_l)
    if eighth_line_x:        
        plt.axvline(x=eighth_line_x, color=color_list[eighth_line_col], 
                    linestyle=eighth_line_ls)
        leg_text_l.append(eighth_line_x_l) 
        
    # Adding horizontal lines
    if first_line_y:
        plt.axhline(y=first_line_y, color='grey', linestyle=':')
        leg_text_l.append(first_line_y_l)
    if second_line_y:
        plt.axhline(y=second_line_y, color='grey', linestyle='--')
        leg_text_l.append(second_line_y_l)
    if third_line_y:
        plt.axhline(y=third_line_y, color='grey', linestyle='-.')
        leg_text_l.append(third_line_y_l)
    if fourth_line_y:
        plt.axhline(y=fourth_line_y, color='grey', linestyle='-.')
        leg_text_l.append(fourth_line_y_l)

    # Adding a legend
    if legend:
        leg_text_l.append(leg_text)
        plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Showing the plot without additional text
    plt.show()
In [7]:
def plot_stacked_bar(x, data, series_labels, col,
                     multidim=True, figsize_w=8, figsize_h=6,
                     title=None, title_fs=16,
                     frame=True,
                     category_labels=None,
                     label_fs=12, ticks_fs=12,
                     x_label=None, rot=0,
                     y_label=None,
                     legend=True, legend_loc=0, legend_fs=10,
                     add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10):
    """
    This function plots a stacked bar chart with the provided data and
    labels.

    Keyword arguments:
    x               -- A list containing the x values (mandatory)
    data            -- A list of lists where each internal list contains
                       data of a series (mandatory)
    series_labels   -- List of series labels (strings) (these appear in
                       the legend) (mandatory)
    col             -- A list of integers controlling the colors of the series
                       (mandatory)
    multidim        -- Defines if data is multidimensional (default is True)
    figsize_w       -- The width of the plot area
    figsize_w       -- The height of the plot area
    title           -- A string containing the title of the chart
    title_fs        -- The title font size
    frame           -- If False, the figure frame is omitted as well as
                       ticks and labels on the y axis
    category_labels -- List of category labels (strings) (these appear
                       on the x-axis)
    label_fs        -- x and y axis labels' font size
    tick_fs         -- The tick values font size
    rot             -- The rotation of the x axisis label (numerical)
                       (the default is horizontal)
    y_label         -- Label for the y-axis (string)
    legend          -- If true it shows a legend
    legend_loc      -- Used to position the legend compared to the centre
                       of the plot
    legend_fs       -- Legend font size
    add_text        -- Additional text to be shown in a box (string)
    addtext_x       -- Used to position the additional text box
    addtext_y       -- Used to position the additional text box
    addtext_fs      -- Font size of the additional text
    """

    # Finding the number of categories
    if multidim:
        cat_number = len(data[0])
    else:
        cat_number = len(data)

    # Preparing the indexes for the x axis
    ind = list(range(cat_number))
    # Initializing a list
    axes = []
    # Defining a numpy array containing the y coordinates of the bars
    # (the bars of the first series are on the x axis)
    bar_base = np.zeros(cat_number)
    # Converting the list with the data into a numpy array
    data = np.array(data)

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=20)
    # Removing the frame and y axis ticks and values if so defined
    if frame is False:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # If category labes are provided, showing them on the x axis
    if category_labels:
        plt.xticks(ind, category_labels, fontsize=ticks_fs, rotation=rot)

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)

    if multidim:
        # Iterating through the dimensions of the array
        for i, row_data in enumerate(data):
            # Creating the bars
            axes.append(plt.bar(x, row_data, bottom=bar_base,
                                color=color_list[col[i]],
                                label=series_labels[i]))
            # Incrementing the bar base height for the next series
            # by the height of the bar of the previous series
            bar_base += row_data
    else:
        # Creating the bars
        axes.append(plt.bar(x, data))

    # Creating a legend
    if legend:
        plt.legend(fontsize=legend_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Adding a text box with additional information
    if add_text:
        box_style = dict(facecolor='white')
        plt.gcf().text(addtext_x, addtext_y,
                       add_text,
                       fontsize=addtext_fs, bbox=box_style)

    # Showing the plot without additional text
    plt.show()
In [8]:
def plot_cust_hbar(data,
                   figsize_w=8, figsize_h=6,
                   frame=True, grid=False,
                   ref_font_size=12,
                   title_text=None,
                   title_offset=20,
                   color_numb=0,
                   categ_labels=True,
                   labels=None,
                   rot=0,
                   show_values=False,
                   omitted_value=0,
                   percent=False,
                   center_al=True,
                   visible_digits=2):
    """
    This function plots a horizontal bar charts for the provided data with
    the provided labels and settings.

    Keyword arguments:
    data            -- A sorted Series that contains categorical data
                       (mandatory)
    figsize_w       -- The width of the plot area
    figsize_h       -- The height of the plot area
    frame           -- If False, the figure frame is omitted as well as
                       ticks and labels on the y axis (default is True)
    grid            -- If True a horizontal grid is displayed. It works
                       only when frame=True (default is False)
    ref_font_size   -- Reference font size used for all the fonts
    title_text      -- A string containing the title of the chart
    title_offset    -- The offset of the title from the rest of the plot
    color_numb      -- An integer between 0 and 9 that indicated the
                       seaborn-deep color to be used for the bars
    categ_labels    -- A boolean variable that defines if category labels
                       shall appear (on the y-axis)
    labels          -- List of category labels (strings) used only if
                       categ_labels=True.
                       They override the existing labels
    rot             -- The rotation of the x axsis label (numerical)
                       (the default is horizontal)
    show_values     -- If True, then numeric value labels will be shown on
                       each bar (default is False)
    omitted_value   -- The max value that shall not be shown in the bar
    percent         -- If true, it indicates that the values are in percentage
                       (default is False)
    center_al       -- A boolean variable that defines if the values shall be
                       written in the centre of the bar (default is True)
    visible_digits  -- Integer defining the number of decimal digits
                       to be seen in the value labels (the default is 2)
    """

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns

    # Defining the suffix to be shown in the bar values
    if percent:
        p = '%'
    else:
        p = ""

    # Preparing the indexes for the x axis
    ind = list(range(len(data)))

    # Creating a new figure
    fig = plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')

    # Removing y axis ticks
    plt.gca().yaxis.set_ticks_position('none')

    if frame is False:
        # Removing the borders, if so defined
        sns.despine(top=True, right=True, left=True, bottom=True)
        # Removing ticks and values in the x axes
        plt.gca().axes.get_xaxis().set_visible(False)
    elif grid:
        # Showing a vertical grig, if so defined
        plt.gca().xaxis.grid(color='grey', alpha=0.25,
                             linestyle='-', linewidth=1)

    # Adding a title (with some distance to the top of the plot)
    plt.title(title_text, fontsize=ref_font_size*1.33,
              loc='center', pad=title_offset)

    # Creating the bar plot
    plot = plt.barh(ind, data, color=color_list[color_numb])

    # Showing category labels on the y axes, if so defined
    if categ_labels:
        # Overriding the index value if category labels are provided
        if labels:
            plt.yticks(ind, labels, fontsize=ref_font_size, rotation=rot)
        else:
            plt.yticks(ind, data.index.tolist(),
                       fontsize=ref_font_size, rotation=rot)
    else:
        # Removing ticks and values in the y axes
        plt.gca().axes.get_yaxis().set_visible(False)

    # Showing the bar values, if so defined
    if show_values:
        # Iterating through the bars in the plot
        for bar in plot:
            # Getting bar height and width
            w, h = bar.get_width(), bar.get_height()
            # Printing the values only if they are bigger than the defined value
            if w > omitted_value:
                if center_al is True:
                    # Positioning the text in the centre of the bar horizontally
                    # and vertically
                    plt.text(bar.get_x() + w/2, bar.get_y() + h/2,
                             "{}".format(round(w, visible_digits))+p,
                             fontsize=ref_font_size, color="white",
                             ha="center", va="center")
                else:
                    # Positioning the text at the right of the bar horizontally
                    # and in the centre vertically
                    plt.text(bar.get_x() + w, bar.get_y() + h/2,
                             "{}".format(round(w, visible_digits))+p,
                             fontsize=ref_font_size,
                             ha="left", va="center")

    # Showing the plot without additional text
    plt.show()

3.3. Project-specific Functions

In [9]:
def find_last_day():
    '''
    This function reads in a certain directory to find the latest CSV file
    and returns the date of the last file in a string in the format mm-dd-yyyy
    '''

    # Getting the list of files in the daily reports folder
    for roots, dirs, files in os.walk('JHU_COVID-19/COVID-19/'
                                      'csse_covid_19_data/'
                                      'csse_covid_19_daily_reports'):
        file_list = files  # list of strings
        # Initializing a new list
        dates = []
        # Iterating through the original list
        for i in list(range(len(file_list))):
            file = file_list[i]
            # If is it a csv file ...
            if re.search("\S+[csv]", file):
                # Extracting the date into a list of string
                date = re.findall("[0-9]+[-][0-9]+[-][0-9]+", file)
                # Converting the format from string to date
                dt_date = dt.datetime.strptime(date[0], "%m-%d-%Y")
                # Appending the date to a list of dates (the new list)
                dates.append(dt_date)
    # Sorting the dates and taking the last one
    dates.sort(reverse=True)
    latest = dates[0]  # datetime
    # Converting the latest date to a string
    last_day = latest.strftime("%m-%d-%Y")

    return last_day
In [10]:
def extract_country(Country, State="Not applicable", days=0):
    '''
    This function allows selecting data related to a specific Country
    from the datasets produced by JHU.
    It takes the following input:
    - a string containing the Country name written with the first letter
    as a capital letter (mandatory)
    - a string containing the State name written with the first letter
    as a capital letter (default = "Not applicable")
    - an integer containing how many days to skip (default = 0)
    It returns a tuple of 2 lists containing data related to confirmed
    and deceased cases.
    '''

    # Extracting confirmed cases
    confirm = world_conf_clean[(world_conf_clean['Country/Region'] == Country) &
                               (world_conf_clean['Province/State'] == State)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    confirm = confirm.iloc[:, 4+days:]
    # Copying the result into a list
    confirm_l = confirm.values.tolist()[0]
    
    # Extracting recovered cases
    recov = world_recov_clean[(world_recov_clean['Country/Region'] == Country) &
                              (world_recov_clean['Province/State'] == State)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    recov = recov.iloc[:, 4+days:]
    # Copying the result into a list
    recov_l = recov.values.tolist()[0]

    # Extracting deceased cases
    deceas = world_deceas_clean[(world_deceas_clean['Country/Region'] ==
                                 Country) &
                                (world_deceas_clean['Province/State'] == State)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    deceas = deceas.iloc[:, 4+days:]
    # Copying the result into a list
    deceas_l = deceas.values.tolist()[0]

    return confirm_l, recov_l, deceas_l
In [11]:
def prep_country_data(Country, State="Not applicable", days=0):
    '''
    This function allows to prepare the data for a specific Country.

    It takes the following inputs:

    - a string variable that contains the name of the Country
    written with the first letter as a capital letter (mandatory)
    - a string containing the State name written with the first letter
    as a capital letter (default = "Not applicable")
    - an integer that tells the number of initial days in the time series
    to skip (default = 0)

    The function uses the following functions:

    - 'extract_country' to extract Country-specific information from
    the relevant dataframes
    - 'calc_increments' to calculate the daily increments in a time series
    - 'extract_non_null' to extract only the non null values of a time series

    The output is a tuple with the following content:

    - a list containing a time series with the cumulative confirmed cases
    - a list containing a time series with the cumulative deceased cases
    - a list containing a time series with the daily increment
    in the confirmed cases
    - a list containing a time series with the cumulative confirmed cases
    starting from the day of the first positive case
    '''

    # Getting the name of thew Country in small letters
    country = Country.lower()
    
    '''
    # Creating descriptive file names
    countryname_hiddendays = "{}_{}". format(country, days)
    countryname_conf_hiddend = "{}_conf_{}". format(country, days)
    countryname_recov_hiddend = "{}_recov_{}". format(country, days)
    countryname_deceas_hiddend = "{}_deceas_{}". format(country, days)
    countryname_conf_incr_hiddend = "{}_conf_incr_{}". format(country, days)
    countryname_conf_pos = "{}_conf_pos". format(country)
    '''
    
    # Extracting country-speficic data by using the function extract_country
    countryname_hiddendays = extract_country(Country, State, days)
    # Extracting the time series for the cumulative confirmed cases
    countryname_conf_hiddend = countryname_hiddendays[0]
    # Extracting the time series for the cumulative recovered cases
    countryname_recov_hiddend = countryname_hiddendays[1]
    # Extracting the time series for the cumulative deceased cases
    countryname_deceas_hiddend = countryname_hiddendays[2]
    # Extracting the time series for the daily increments in the confirmed cases
    countryname_conf_incr_hiddend = calc_increments(countryname_conf_hiddend)
    # Extracting the complete time series about the cumulative confirmed cases
    complete_conf_series = extract_country(Country, State, 0)
    # Extracting the time series for the cumulative confirmed cases
    # starting from the day of the first positive case
    countryname_conf_pos = extract_non_null(complete_conf_series[0])

    return countryname_conf_hiddend, \
           countryname_recov_hiddend, \
           countryname_deceas_hiddend, \
           countryname_conf_incr_hiddend, \
           countryname_conf_pos
In [12]:
def extract_non_null(input_list):
    '''
    This function takes as input a list that contains a certain number of
    zero values, omits such values and returns what is left in a new list.
    '''

    # Initializing a list
    no_null = []
    # Looping through all the elements of the list
    for i in list(range(len(input_list))):
        if input_list[i] != 0:
            # Extracting non null values
            no_null.append(input_list[i])

    return no_null
In [13]:
def pop_perc(values, pop):
    '''
    This function takes the following inputs:
    - a list of floats in units
    - a float in million of units

    The function calculates the percentage values of the values in the list
    compared to the value in the single float miltiplied one million times.
    The function is useful, for example, to calculate the number of
    confirmed Coronavirus cases pro capite
    (in percentage of the total pupulation in millions).

    The function retunts a list of floats.
    '''

    result = (pd.Series(values)/(pop*1000000))*100

    return result
In [14]:
def find_error_days(listname):
    '''
    This function:
    takes a list,
    finds if the list contains negative increments by using the function find_neg_increm(listname),
    compares the position of such negative increments to the position of the days in the list days_tot and
    returns the corresponding days in a new list
    '''
    
    
    # Initializing a list to contain the positions in the list containing negative increments
    posit = []
    # Initializing a list to contain the days corresponding to negative increments in the list
    result = []
    # Checking for negative increments in the input list and storing their positions
    for position, item in enumerate(find_neg_increm(listname)):
        if item == 1:
            posit.append(position)
    # Finding the corresponding day 
    for position, item in enumerate(days_tot):
        if position in posit:
            result.append(item)
    print(result)

4. Dumping and Collecting the Data

The source csv files are located in the following directoryies:

  • JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_time_series
  • JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports

Those directory shall be located under the directory containing this notebook.

In [15]:
# Loading the data files into pandas dataframes
# Loading the world time series
world_confirmed = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                              'csse_covid_19_time_series/'
                              'time_series_covid19_confirmed_global.csv')
world_recovered = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                             'csse_covid_19_time_series/'
                             'time_series_covid19_recovered_global.csv')
world_deceased = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                             'csse_covid_19_time_series/'
                             'time_series_covid19_deaths_global.csv')
In [16]:
# Uploading the latest daily report
last_day = find_last_day()  # calling the function last_day
daily_report = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                           'csse_covid_19_daily_reports/' + last_day + '.csv')

File descriptions

  • time_series_covid19_confirmed_global.csv: confirmed cases for each day for each Country
  • time_series_covid19_recovered_global.csv: recovered cases for each day for each Country
  • time_series_covid19_deaths_global.csv: confirmed cases for each day for each Country
  • mm-dd-yyyy.csv: last available daily report
In [17]:
# Storing the total population for the Countries of interest (in millions)
# (source: Google)
italy_pop = 60.48
spain_pop = 46.66
germany_pop = 82.79
france_pop = 66.99
switzerland_pop = 8.57
netherlands_pop = 17.18
austria_pop = 8.822
belgium_pop = 11.4
portugal_pop = 10.29
luxembourg_pop = 0.602
poland_pop = 37.97
ireland_pop = 4.904
estonia_pop = 1.328
denmark_pop = 5.603
norway_pop = 5.368
sweden_pop = 10.12
iceland_pop = 0.364
finland_pop = 5.513
uk_pop = 66.44
us_pop = 327.2
hubei_pop = 58.5
china_pop = 1386
restchina_pop = china_pop-hubei_pop
brazil_pop = 212.559
russia_pop = 145.9
india_pop = 1380
In [18]:
# Storing the population density for the Countries of interest (people/km2)
# (source: Google)
italy_dens = 201.3
spain_dens = 91.4
germany_dens = 240
france_dens = 122.34
switzerland_dens = 219
netherlands_dens = 488
austria_dens = 109
belgium_dens = 383
portugal_dens = 111
luxembourg_dens = 242
poland_dens = 124
ireland_dens = 72
estonia_dens = 31
denmark_dens = 134
norway_dens = 15
sweden_dens = 25
iceland_dens = 3
finland_dens = 15
uk_dens = 274
us_dens = 36
hubei_dens = 310
china_dens = 145
brazil_dens = 25
russia_dens = 8.54
india_dens = 464
In [19]:
# Storing the median age for the Countries of interest
# source: https://en.wikipedia.org/wiki/List_of_countries_by_median_age
italy_median_age = 45.5
spain_median_age = 42.7
france_median_age = 41.4
switzerland_median_age = 42.4
netherlands_median_age = 42.6
austria_median_age = 44.0
belgium_median_age = 41.4
portugal_median_age = 42.2
luxembourg_median_age = 39.3
poland_median_age = 39.7
ireland_median_age = 36.5
estonia_median_age = 41.6
denmark_median_age = 42.2
norway_median_age = 39.2
sweden_median_age = 41.2
iceland_median_age = 36.5
finland_median_age = 42.5
uk_median_age = 40.5
us_median_age = 38.1
china_median_age = 37.4
brazil_median_age = 31.4
russia_median_age = 38.6
india_median_age = 26.8
In [20]:
# List of containment actions taken by the Finnish Government

# Creating a dataframe
measures = pd.DataFrame(columns=['Date', 'Actions'])

# Adding the actions
measures = measures.append(pd.Series(["12.3.",
"First containment measures: gathering of more than 500 people banned"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["16.3.",
"State of emergency declared: closing shools, universities, museums, theatres, \
libraries, sport facilities; gathering of more than 10 people banned"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["28.3.",
"Additional containment measures: Uusimaa region borders closed,  \
restaurant dining forbidden"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["11.4.",
"Additional containment measures: No passengers in ships from Germany, Sweden, Estonia"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["15.4.",
"First releasing measures: Uusima border re-opened"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["14.5.",
"More releasing misures: schools opening, business travell allowed within Schengen"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["1.6.",
"Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, \
reopening of museums and theatres"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["15.6.",
"End of state of emergency"],
index=measures.columns), ignore_index=True)

5. Data Analysis

5.1. Summary

Preliminary Data Analysis

The 3 time series files have columns for Province/State, Country/Region, latitude, longitude and data for each day. The columns related to the day are named in the format m/d/yy.

Each entry represents a different location. One Country can be associated with more than one State/Province and in this case one Country has more than one entry. This happens for US, China, Canada, France, Australia, United Kingdom, Netherlands and Denmark.

The daily report file has columns for Province/State, Country/Region, latitude, longitude and time stamp as well as cumulative confirmed, deaths and recovered cases.

Data Cleansing

NaN values have been handled by filling with the string "Not applicable".

Data Preparation

Separate datasets with no GPS coordinates and no time stamp have been created.

Separate datasets have been created to group data by Country.

A list of relevant dates for the plots has been created.

Country specific data has been extracted.

World-wide grand totals have been calculated.

A summary of the created datasets is available in section 5.5.

5.2. Preliminary Data Analysis

In [21]:
# Showing basic dataframe info
df_basic_data(world_confirmed)
Dataframe name: world_confirmed 

Dataframe length: 266 

Number of columns: 162 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[21]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 81 185
Country/Region object 188 0
Lat float64 257 0
Long float64 260 0
1/22/20 int64 11 0
... ... ... ...
6/23/20 int64 249 0
6/24/20 int64 247 0
6/25/20 int64 249 0
6/26/20 int64 249 0
6/27/20 int64 248 0

162 rows × 3 columns

In [22]:
# Showing basic dataframe info
df_basic_data(world_recovered)
Dataframe name: world_recovered 

Dataframe length: 253 

Number of columns: 162 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[22]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 67 186
Country/Region object 188 0
Lat float64 246 0
Long float64 248 0
1/22/20 int64 2 0
... ... ... ...
6/23/20 int64 232 0
6/24/20 int64 233 0
6/25/20 int64 232 0
6/26/20 int64 231 0
6/27/20 int64 233 0

162 rows × 3 columns

In [23]:
# Showing basic dataframe info
df_basic_data(world_deceased)
Dataframe name: world_deceased 

Dataframe length: 266 

Number of columns: 162 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[23]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 81 185
Country/Region object 188 0
Lat float64 257 0
Long float64 260 0
1/22/20 int64 2 0
... ... ... ...
6/23/20 int64 137 0
6/24/20 int64 135 0
6/25/20 int64 138 0
6/26/20 int64 139 0
6/27/20 int64 135 0

162 rows × 3 columns

In [24]:
# Showing basic dataframe info
df_basic_data(daily_report)
Dataframe name: daily_report 

Dataframe length: 3783 

Number of columns: 14 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[24]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
FIPS float64 3101 682
Admin2 object 1799 678
Province_State object 548 169
Country_Region object 188 0
Last_Update object 1 0
Lat float64 3708 74
Long_ float64 3698 74
Confirmed int64 1387 0
Deaths int64 417 0
Recovered int64 530 0
Active int64 1244 0
Combined_Key object 3783 0
Incidence_Rate float64 3680 74
Case-Fatality_Ratio float64 1784 58
In [25]:
# Checking how data looks like
print("world_confirmed")
world_confirmed.head()
world_confirmed
Out[25]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 5 7 7 7 11 16 21 22 22 22 24 24 40 40 74 84 94 110 110 120 170 174 237 273 281 299 349 367 423 444 484 521 555 607 665 714 784 840 906 933 996 1026 1092 1176 1279 1351 1463 1531 1703 1828 1939 2171 2335 2469 2704 2894 3224 3392 3563 3778 4033 4402 4687 4963 5226 5639 6053 6402 6664 7072 7653 8145 8676 9216 9998 10582 11173 11831 12456 13036 13659 14525 15205 15750 16509 17267 18054 18969 19551 20342 20917 21459 22142 22890 23546 24102 24766 25527 26310 26874 27532 27878 28424 28833 29157 29481 29640 30175 30451 30616
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 10 12 23 33 38 42 51 55 59 64 70 76 89 104 123 146 174 186 197 212 223 243 259 277 304 333 361 377 383 400 409 416 433 446 467 475 494 518 539 548 562 584 609 634 663 678 712 726 736 750 766 773 782 789 795 803 820 832 842 850 856 868 872 876 880 898 916 933 946 948 949 964 969 981 989 998 1004 1029 1050 1076 1099 1122 1137 1143 1164 1184 1197 1212 1232 1246 1263 1299 1341 1385 1416 1464 1521 1590 1672 1722 1788 1838 1891 1962 1995 2047 2114 2192 2269 2330
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 3 5 12 12 17 17 19 20 20 20 24 26 37 48 54 60 74 87 90 139 201 230 264 302 367 409 454 511 584 716 847 986 1171 1251 1320 1423 1468 1572 1666 1761 1825 1914 1983 2070 2160 2268 2418 2534 2629 2718 2811 2910 3007 3127 3256 3382 3517 3649 3848 4006 4154 4295 4474 4648 4838 4997 5182 5369 5558 5723 5891 6067 6253 6442 6629 6821 7019 7201 7377 7542 7728 7918 8113 8306 8503 8697 8857 8997 9134 9267 9394 9513 9626 9733 9831 9935 10050 10154 10265 10382 10484 10589 10698 10810 10919 11031 11147 11268 11385 11504 11631 11771 11920 12076 12248 12445 12685 12968
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 39 39 53 75 88 113 133 164 188 224 267 308 334 370 376 390 428 439 466 501 525 545 564 583 601 601 638 646 659 673 673 696 704 713 717 717 723 723 731 738 738 743 743 743 745 745 747 748 750 751 751 752 752 754 755 755 758 760 761 761 761 761 761 761 762 762 762 762 762 763 763 763 763 764 764 764 765 844 851 852 852 852 852 852 852 852 852 853 853 853 853 854 854 855 855 855 855 855 855 855 855 855 855
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 3 3 4 4 5 7 7 7 8 8 8 10 14 16 17 19 19 19 19 19 19 19 19 19 19 24 24 24 24 25 25 25 25 26 27 27 27 27 30 35 35 35 36 36 36 43 43 45 45 45 45 48 48 48 48 50 52 52 58 60 61 69 70 70 71 74 81 84 86 86 86 86 86 86 88 91 92 96 113 118 130 138 140 142 148 155 166 172 176 183 186 189 197 212 212 259
In [26]:
# Checking how data looks like
print("world_recovered")
world_recovered.head()
world_recovered
Out[26]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 5 5 10 10 10 15 18 18 29 32 32 32 32 32 40 43 54 99 112 131 135 150 166 179 188 188 207 220 228 252 260 310 331 345 397 421 458 468 472 502 558 558 610 648 691 745 745 778 801 850 930 938 996 1040 1075 1097 1128 1138 1209 1259 1303 1328 1428 1450 1522 1585 1762 1830 1875 2171 2651 3013 3326 3928 4201 4725 5164 5508 6158 7660 7962 8292 8764 8841 9260 9869 10174 10306 10674
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 10 17 17 31 31 33 44 52 67 76 89 99 104 116 131 154 165 182 197 217 232 248 251 277 283 302 314 327 345 356 385 394 403 410 422 431 455 470 488 519 531 543 570 595 605 620 627 650 654 682 688 694 705 714 715 727 742 758 771 777 783 789 795 803 812 823 851 857 872 877 891 898 898 910 925 938 945 960 980 1001 1034 1039 1044 1055 1064 1077 1086 1114 1126 1134 1159 1195 1217 1250 1298 1346
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 12 12 12 12 12 32 32 32 65 65 24 65 29 29 31 31 37 46 61 61 62 90 90 90 113 237 347 405 460 591 601 691 708 783 846 894 1047 1099 1152 1204 1355 1408 1479 1508 1558 1651 1702 1779 1821 1872 1936 1998 2067 2197 2323 2467 2546 2678 2841 2998 3058 3158 3271 3409 3507 3625 3746 3968 4062 4256 4426 4784 4747 4918 5129 5277 5422 5549 5748 5894 6067 6218 6297 6453 6631 6717 6799 6951 7074 7255 7322 7420 7606 7735 7842 7943 8078 8196 8324 8422 8559 8674 8792 8920 9066 9202
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 16 21 26 31 39 52 58 71 71 128 128 128 169 169 191 205 235 248 282 309 333 344 344 344 385 398 423 468 468 472 493 499 514 521 526 537 545 550 550 568 576 596 604 615 617 624 628 639 639 652 653 653 663 676 676 681 684 692 694 698 733 735 738 741 741 744 751 757 759 780 781 781 781 789 789 791 792 792 792 792 796 797 797 797 799 799
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 2 4 4 4 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 11 11 11 11 11 11 11 11 13 13 13 13 14 14 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 21 24 24 38 38 40 41 42 61 61 64 64 64 64 66 66 77 77 77 77 81 81 81
In [27]:
# Checking how data looks like
print("world_deceased")
world_deceased.head()
world_deceased
Out[27]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 4 4 4 4 4 4 4 6 6 7 7 11 14 14 15 15 18 18 21 23 25 30 30 30 33 36 36 40 42 43 47 50 57 58 60 64 68 72 85 90 95 104 106 109 115 120 122 127 132 136 153 168 169 173 178 187 193 205 216 218 219 220 227 235 246 249 257 265 270 294 300 309 327 357 369 384 405 426 446 451 471 478 491 504 546 548 569 581 598 618 639 675 683 703
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 4 5 5 6 8 10 10 11 15 15 16 17 20 20 21 22 22 23 23 23 23 23 24 25 26 26 26 26 26 26 27 27 27 27 28 28 30 30 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 32 32 33 33 33 33 33 33 33 33 33 33 33 34 34 34 34 34 35 36 36 36 36 37 38 39 42 43 44 44 45 47 49 51 53
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 4 4 7 9 11 15 17 17 19 21 25 26 29 31 35 44 58 86 105 130 152 173 193 205 235 256 275 293 313 326 336 348 364 367 375 384 392 402 407 415 419 425 432 437 444 450 453 459 463 465 470 476 483 488 494 502 507 515 522 529 536 542 548 555 561 568 575 582 592 600 609 617 623 630 638 646 653 661 667 673 681 690 698 707 715 724 732 741 751 760 767 777 788 799 811 825 837 845 852 861 869 878 885 892
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 3 3 3 6 8 12 14 15 16 17 18 21 22 23 25 26 26 29 29 31 33 33 35 35 36 37 37 37 37 40 40 40 40 41 42 42 43 44 45 45 46 46 47 47 48 48 48 48 49 49 49 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 52 52 52 52 52 52 52 52 52 52 52 52
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 6 6 6 6 7 8 8 9 9 10 10 10 10 10 10
In [28]:
# Checking how data looks like
print("daily_report")
daily_report.head()
daily_report
Out[28]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
0 45001.0 Abbeville South Carolina US 2020-06-28 04:34:04 34.223334 -82.461707 103 0 0 103 Abbeville, South Carolina, US 419.945366 0.000000
1 22001.0 Acadia Louisiana US 2020-06-28 04:34:04 30.295065 -92.414197 783 36 0 747 Acadia, Louisiana, US 1261.987267 4.597701
2 51001.0 Accomack Virginia US 2020-06-28 04:34:04 37.767072 -75.632346 1039 14 0 1025 Accomack, Virginia, US 3215.125634 1.347449
3 16001.0 Ada Idaho US 2020-06-28 04:34:04 43.452658 -116.241552 1841 23 0 1818 Ada, Idaho, US 382.277761 1.249321
4 19001.0 Adair Iowa US 2020-06-28 04:34:04 41.330756 -94.471059 14 0 0 14 Adair, Iowa, US 195.749441 0.000000
In [29]:
# Checking the Countries that are associated to more than one entry
print("Countries that are associated to more than one entry and number of entries\n")
print(daily_report['Country_Region'].value_counts().head(8).to_string())
Countries that are associated to more than one entry and number of entries

US          3112
Russia        83
Japan         48
India         36
Colombia      34
China         33
Mexico        32
Brazil        27
In [30]:
# Checking the logic behind the classification
daily_report[daily_report['Country_Region'] == "Denmark"]
Out[30]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
3232 NaN NaN Faroe Islands Denmark 2020-06-28 04:34:04 61.8926 -6.9118 187 0 187 0 Faroe Islands, Denmark 382.686995 0.000000
3251 NaN NaN Greenland Denmark 2020-06-28 04:34:04 71.7069 -42.6043 13 0 13 0 Greenland, Denmark 22.898612 0.000000
3655 NaN NaN NaN Denmark 2020-06-28 04:34:04 56.2639 9.5018 12675 604 11508 563 Denmark 218.828656 4.765286

For France, United Kingdom, Netherlands and Denmark, in order to get the data related to the main land it is enough to search for Country_Region = countryname and Province_State = NaN.

This excludes from UK the Isle of Man and Channel Islands.

For Australia, it is enough to sum up all the entries where Country_Region = countryname. This includes Tasmania.

The same can be done for China and this will include also Hainan and Hong Kong.

For Canada, summing all the entries include also the people from Diamond Princess and Grand Princes ships, we well as Prince Edward Island population.

The procedure to follow for US shall still be determined. So far all the entries have been added up.

In [31]:
print("Population of different Countries in million (source: Google):\n\n",
     "Italy:", italy_pop, "\n",
     "Spain:", spain_pop, "\n",
     "Germany:", germany_pop, "\n",
     "France:", france_pop, "\n",
     "Switzerland:", switzerland_pop, "\n",
     "Netherlands:", netherlands_pop, "\n",
     "Austria:", austria_pop, "\n",
     "Belgium:", belgium_pop, "\n",
     "Portugal:", portugal_pop, "\n",
     "Luxembourg:", luxembourg_pop, "\n",
     "Poland:", poland_pop, "\n",
     "Ireland:", ireland_pop, "\n",
     "Estonia:", estonia_pop, "\n",
     "Denmark:", denmark_pop, "\n",
     "Norway:", norway_pop, "\n",
     "Sweden:", sweden_pop, "\n",
     "Iceland:", iceland_pop, "\n",
     "Finland:", finland_pop, "\n",
     "UK:", uk_pop, "\n",
     "Brazil:", brazil_pop, "\n",
     "Russia:", russia_pop, "\n",
     "India:", india_pop, "\n")

print("NOTE: those figures are approximative.")
Population of different Countries in million (source: Google):

 Italy: 60.48 
 Spain: 46.66 
 Germany: 82.79 
 France: 66.99 
 Switzerland: 8.57 
 Netherlands: 17.18 
 Austria: 8.822 
 Belgium: 11.4 
 Portugal: 10.29 
 Luxembourg: 0.602 
 Poland: 37.97 
 Ireland: 4.904 
 Estonia: 1.328 
 Denmark: 5.603 
 Norway: 5.368 
 Sweden: 10.12 
 Iceland: 0.364 
 Finland: 5.513 
 UK: 66.44 
 Brazil: 212.559 
 Russia: 145.9 
 India: 1380 

NOTE: those figures are approximative.
In [32]:
print("Density of population of different Countries in people per square kilometre\n"\
      "(source: Google):\n\n",
     "Italy:", italy_dens, "\n",
     "Spain:", spain_dens, "\n",
     "Germany:", germany_dens, "\n",
     "France:", france_dens, "\n",
     "Switzerland:", switzerland_dens, "\n",
     "Netherlands:", netherlands_dens, "\n",
     "Austria:", austria_dens, "\n",
     "Belgium:", belgium_dens, "\n",
     "Portugal:", portugal_dens, "\n",
     "Luxembourg:", luxembourg_dens, "\n",
     "Poland:", poland_dens, "\n",
     "Ireland:", ireland_dens, "\n",
     "Estonia:", poland_dens, "\n",
     "Denmark:", denmark_dens, "\n",
     "Norway:", norway_dens, "\n",
     "Sweden:", sweden_dens, "\n",
     "Iceland:", iceland_dens, "\n",
     "Finland:", finland_dens, "\n",
     "UK:", uk_dens, "\n",
     "Brazil:", brazil_dens, "\n",
     "Russia:", russia_dens, "\n",
     "India:", india_dens, "\n")

print("NOTE: those figures are approximative.")
Density of population of different Countries in people per square kilometre
(source: Google):

 Italy: 201.3 
 Spain: 91.4 
 Germany: 240 
 France: 122.34 
 Switzerland: 219 
 Netherlands: 488 
 Austria: 109 
 Belgium: 383 
 Portugal: 111 
 Luxembourg: 242 
 Poland: 124 
 Ireland: 72 
 Estonia: 124 
 Denmark: 134 
 Norway: 15 
 Sweden: 25 
 Iceland: 3 
 Finland: 15 
 UK: 274 
 Brazil: 25 
 Russia: 8.54 
 India: 464 

NOTE: those figures are approximative.
In [33]:
print("Median age of different Countries (source: Wikipedia):\n\n",
      "Finland:", finland_median_age, "\n",
      "Denmark:", denmark_median_age, "\n",
      "Norwayd:", norway_median_age, "\n",
      "Sweden:", sweden_median_age, "\n",
      "Iceland:", iceland_median_age, "\n",
      "Italy:", italy_median_age, "\n",
      "Spain:", spain_median_age, "\n",
      "France:", france_median_age, "\n",
      "Switzerland:", switzerland_median_age, "\n",
      "Netherlands:", netherlands_median_age, "\n",
      "Austria:", austria_median_age, "\n",
      "Belgium:", belgium_median_age, "\n",
      "Portugal:", portugal_median_age, "\n",
      "Luxembourg:", luxembourg_median_age, "\n",
      "Polandd:", poland_median_age, "\n",
      "Ireland:", ireland_median_age, "\n",
      "Estonia:", estonia_median_age, "\n",
      "Brazil:", brazil_median_age, "\n",
      "Russia:", russia_median_age, "\n",
      "India:", india_median_age, "\n")
      
print("NOTE: those figures are from year 2018.")
Median age of different Countries (source: Wikipedia):

 Finland: 42.5 
 Denmark: 42.2 
 Norwayd: 39.2 
 Sweden: 41.2 
 Iceland: 36.5 
 Italy: 45.5 
 Spain: 42.7 
 France: 41.4 
 Switzerland: 42.4 
 Netherlands: 42.6 
 Austria: 44.0 
 Belgium: 41.4 
 Portugal: 42.2 
 Luxembourg: 39.3 
 Polandd: 39.7 
 Ireland: 36.5 
 Estonia: 41.6 
 Brazil: 31.4 
 Russia: 38.6 
 India: 26.8 

NOTE: those figures are from year 2018.
In [34]:
pd.options.display.max_colwidth = 150

print("Containment actions by the Finnish Government:\n")
# Setting both text and column headers text aligned to the left
# and omitting the indexes
measures.style.set_properties(**{'text-align': 'left'}).\
set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ]).hide_index()
Containment actions by the Finnish Government:

Out[34]:
Date Actions
12.3. First containment measures: gathering of more than 500 people banned
16.3. State of emergency declared: closing shools, universities, museums, theatres, libraries, sport facilities; gathering of more than 10 people banned
28.3. Additional containment measures: Uusimaa region borders closed, restaurant dining forbidden
11.4. Additional containment measures: No passengers in ships from Germany, Sweden, Estonia
15.4. First releasing measures: Uusima border re-opened
14.5. More releasing misures: schools opening, business travell allowed within Schengen
1.6. Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, reopening of museums and theatres
15.6. End of state of emergency

5.3. Data Cleansing

In [35]:
# Converting null values in strings with value "Not applicable"
world_conf_clean = world_confirmed.fillna("Not applicable")
world_recov_clean = world_recovered.fillna("Not applicable")
world_deceas_clean = world_deceased.fillna("Not applicable")
daily_rep_clean = daily_report.fillna("Not applicable")

5.4. Data Preparation

5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries

In [36]:
# Dropping the GPS coordinates and storing the result in new datasets
world_conf_short = world_conf_clean.drop(['Lat', 'Long'], axis=1)
world_recov_short = world_recov_clean.drop(['Lat', 'Long'], axis=1)
world_deceas_short = world_deceas_clean.drop(['Lat', 'Long'], axis=1)
# Dropping the columns not related to the cases counters
daily_rep_short = daily_rep_clean.drop(['Lat',
                                        'Long_',
                                        'Last_Update',
                                        'FIPS',
                                        'Admin2',
                                        'Combined_Key'],\
                                       axis=1)

# Grouping by Province/State and storing the result in new datasets
world_conf_group = world_conf_short.groupby(['Country/Region']).sum()
world_recov_group = world_recov_short.groupby(['Country/Region']).sum()
world_deceas_group = world_deceas_short.groupby(['Country/Region']).sum()
daily_rep_group = daily_rep_short.groupby(['Country_Region']).sum()
In [37]:
# Creating a list of dates

# Extracting only the columns containing the virus cases data for each day
world_conf_data = world_confirmed.iloc[:,4:]
# Extracting the column values (dates) and putting them in a list
days_all = world_conf_data.columns.values.tolist()

# Initializing an empty list
days_tot = []
# Looping through the number of days
for i in list(range(len(days_all))):
    # Extracting day and month and taking just the string value
    new_element=re.findall("[0-9]+[/][0-9]+", days_all[i])[0]
    # Adding the result to the list
    days_tot.append(new_element)
    
print("List of days for the plots:\n")
days_tot
List of days for the plots:

Out[37]:
['1/22',
 '1/23',
 '1/24',
 '1/25',
 '1/26',
 '1/27',
 '1/28',
 '1/29',
 '1/30',
 '1/31',
 '2/1',
 '2/2',
 '2/3',
 '2/4',
 '2/5',
 '2/6',
 '2/7',
 '2/8',
 '2/9',
 '2/10',
 '2/11',
 '2/12',
 '2/13',
 '2/14',
 '2/15',
 '2/16',
 '2/17',
 '2/18',
 '2/19',
 '2/20',
 '2/21',
 '2/22',
 '2/23',
 '2/24',
 '2/25',
 '2/26',
 '2/27',
 '2/28',
 '2/29',
 '3/1',
 '3/2',
 '3/3',
 '3/4',
 '3/5',
 '3/6',
 '3/7',
 '3/8',
 '3/9',
 '3/10',
 '3/11',
 '3/12',
 '3/13',
 '3/14',
 '3/15',
 '3/16',
 '3/17',
 '3/18',
 '3/19',
 '3/20',
 '3/21',
 '3/22',
 '3/23',
 '3/24',
 '3/25',
 '3/26',
 '3/27',
 '3/28',
 '3/29',
 '3/30',
 '3/31',
 '4/1',
 '4/2',
 '4/3',
 '4/4',
 '4/5',
 '4/6',
 '4/7',
 '4/8',
 '4/9',
 '4/10',
 '4/11',
 '4/12',
 '4/13',
 '4/14',
 '4/15',
 '4/16',
 '4/17',
 '4/18',
 '4/19',
 '4/20',
 '4/21',
 '4/22',
 '4/23',
 '4/24',
 '4/25',
 '4/26',
 '4/27',
 '4/28',
 '4/29',
 '4/30',
 '5/1',
 '5/2',
 '5/3',
 '5/4',
 '5/5',
 '5/6',
 '5/7',
 '5/8',
 '5/9',
 '5/10',
 '5/11',
 '5/12',
 '5/13',
 '5/14',
 '5/15',
 '5/16',
 '5/17',
 '5/18',
 '5/19',
 '5/20',
 '5/21',
 '5/22',
 '5/23',
 '5/24',
 '5/25',
 '5/26',
 '5/27',
 '5/28',
 '5/29',
 '5/30',
 '5/31',
 '6/1',
 '6/2',
 '6/3',
 '6/4',
 '6/5',
 '6/6',
 '6/7',
 '6/8',
 '6/9',
 '6/10',
 '6/11',
 '6/12',
 '6/13',
 '6/14',
 '6/15',
 '6/16',
 '6/17',
 '6/18',
 '6/19',
 '6/20',
 '6/21',
 '6/22',
 '6/23',
 '6/24',
 '6/25',
 '6/26',
 '6/27']
In [38]:
# Listing the Countries
print("List of Countries:\n")
world_conf_group.index.to_list()
List of Countries:

Out[38]:
['Afghanistan',
 'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Burma',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo (Brazzaville)',
 'Congo (Kinshasa)',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Denmark',
 'Diamond Princess',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Finland',
 'France',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Grenada',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Guyana',
 'Haiti',
 'Holy See',
 'Honduras',
 'Hungary',
 'Iceland',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Ireland',
 'Israel',
 'Italy',
 'Jamaica',
 'Japan',
 'Jordan',
 'Kazakhstan',
 'Kenya',
 'Korea, South',
 'Kosovo',
 'Kuwait',
 'Kyrgyzstan',
 'Laos',
 'Latvia',
 'Lebanon',
 'Lesotho',
 'Liberia',
 'Libya',
 'Liechtenstein',
 'Lithuania',
 'Luxembourg',
 'MS Zaandam',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Maldives',
 'Mali',
 'Malta',
 'Mauritania',
 'Mauritius',
 'Mexico',
 'Moldova',
 'Monaco',
 'Mongolia',
 'Montenegro',
 'Morocco',
 'Mozambique',
 'Namibia',
 'Nepal',
 'Netherlands',
 'New Zealand',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'North Macedonia',
 'Norway',
 'Oman',
 'Pakistan',
 'Panama',
 'Papua New Guinea',
 'Paraguay',
 'Peru',
 'Philippines',
 'Poland',
 'Portugal',
 'Qatar',
 'Romania',
 'Russia',
 'Rwanda',
 'Saint Kitts and Nevis',
 'Saint Lucia',
 'Saint Vincent and the Grenadines',
 'San Marino',
 'Sao Tome and Principe',
 'Saudi Arabia',
 'Senegal',
 'Serbia',
 'Seychelles',
 'Sierra Leone',
 'Singapore',
 'Slovakia',
 'Slovenia',
 'Somalia',
 'South Africa',
 'South Sudan',
 'Spain',
 'Sri Lanka',
 'Sudan',
 'Suriname',
 'Sweden',
 'Switzerland',
 'Syria',
 'Taiwan*',
 'Tajikistan',
 'Tanzania',
 'Thailand',
 'Timor-Leste',
 'Togo',
 'Trinidad and Tobago',
 'Tunisia',
 'Turkey',
 'US',
 'Uganda',
 'Ukraine',
 'United Arab Emirates',
 'United Kingdom',
 'Uruguay',
 'Uzbekistan',
 'Venezuela',
 'Vietnam',
 'West Bank and Gaza',
 'Western Sahara',
 'Yemen',
 'Zambia',
 'Zimbabwe']

5.4.2. Population age data

In [39]:
# Creating a Pandas series containing median ages for different Countries in EU
countries_median_age = pd.Series({'Finland': finland_median_age,
                                  'Denmark': denmark_median_age,
                                  'Norway': norway_median_age,
                                  'Sweden': sweden_median_age,
                                  'Iceland': iceland_median_age,
                                  'Italy': italy_median_age,
                                  'Spain': spain_median_age,
                                  'France': france_median_age,
                                  'Switzerland': switzerland_median_age,
                                  'Netherlands': netherlands_median_age,
                                  'Austria': austria_median_age,
                                  'Belgium': belgium_median_age,
                                  'Portugal': portugal_median_age,
                                  'Luxembourg': luxembourg_median_age,
                                  'Poland': poland_median_age,
                                  'Ireland': ireland_median_age,
                                  'Estonia': estonia_median_age})
# Calculating the minimum value
median_age_min = countries_median_age.min()
# Calculating the maximum value
median_age_max = countries_median_age.max()
# Calculating the median age range
median_age_range = median_age_max - median_age_min
print("The range of the median age in the EU Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(median_age_range))
The range of the median age in the EU Countries that are analyzed here is: 9.0 years

5.4.3. World Data

In [40]:
# Selecting only the columns with the daily data
world_conf = world_conf_short.iloc[:,2:]
world_recov = world_recov_short.iloc[:,2:]
world_deceas = world_deceas_short.iloc[:,2:]
In [41]:
# Calculating cumulative worldwide data for each day
world_conf_tot = world_conf.sum()
world_recov_tot = world_recov.sum()
world_deceas_tot = world_deceas.sum()
In [42]:
# Calculating the active cases for each day
world_act_tot = list(np.array(world_conf_tot) - \
                     np.array(world_recov_tot) - \
                     np.array(world_deceas_tot))
In [43]:
# Calculating the daily increments in the deceased cases
world_conf_incr = calc_increments(world_conf_tot)
# Calculating the daily increments in the confirmed cases
world_deceas_incr = calc_increments(world_deceas_tot)
In [44]:
# Finding the cumulative per capita data worldwide
world_conf_perc = pop_perc(world_conf_tot, 7.8*1000)
world_deceas_perc = pop_perc(world_deceas_tot, 7.8*1000)

5.4.4. Finnish data

In [45]:
# Calling the function extract_country to extract data related to Finland
# (skipping the first 6 days since they contain no confirmed cases)
Finland_6 = extract_country("Finland", "Not applicable", 6)
# Extracting the confirmed cases
finland_conf_6 = Finland_6[0]
# Extracting the recovered cases
finland_recov_6 = Finland_6[1]
# Extracting the decased cases
finland_deceas_6 = Finland_6[2]

# Creating a list of days to use for Finnish charts
# (skipping the first 6 days)
days_fin = days_tot[6:]
In [46]:
print("Compact Finnish data set:\n")
print("first day:", days_fin[0])
print("number of days:", len(days_fin))
Compact Finnish data set:

first day: 1/28
number of days: 152
In [47]:
# Visualizing the complete series
print("Confirmed cases time series:")
finland_conf_6
Confirmed cases time series:
Out[47]:
[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 2,
 2,
 3,
 6,
 6,
 6,
 6,
 12,
 15,
 15,
 23,
 30,
 40,
 59,
 59,
 155,
 225,
 244,
 277,
 321,
 336,
 400,
 450,
 523,
 626,
 700,
 792,
 880,
 958,
 1041,
 1167,
 1240,
 1352,
 1418,
 1446,
 1518,
 1615,
 1882,
 1927,
 2176,
 2308,
 2487,
 2605,
 2769,
 2905,
 2974,
 3064,
 3161,
 3237,
 3369,
 3489,
 3681,
 3783,
 3868,
 4014,
 4129,
 4284,
 4395,
 4475,
 4576,
 4695,
 4740,
 4906,
 4995,
 5051,
 5176,
 5254,
 5327,
 5412,
 5573,
 5673,
 5738,
 5880,
 5962,
 5984,
 6003,
 6054,
 6145,
 6228,
 6286,
 6347,
 6380,
 6399,
 6443,
 6493,
 6537,
 6568,
 6579,
 6599,
 6628,
 6692,
 6743,
 6776,
 6826,
 6859,
 6885,
 6887,
 6911,
 6911,
 6941,
 6964,
 6981,
 7001,
 7025,
 7040,
 7064,
 7073,
 7087,
 7104,
 7108,
 7112,
 7117,
 7119,
 7133,
 7142,
 7143,
 7144,
 7155,
 7167,
 7172,
 7191,
 7198]
In [48]:
# Visualizing the complete series
print("Recovered cases time series:")
finland_recov_6
Recovered cases time series:
Out[48]:
[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 1700,
 1700,
 1700,
 1700,
 2000,
 2000,
 2000,
 2000,
 2500,
 2500,
 2500,
 2500,
 2800,
 2800,
 3000,
 3000,
 3000,
 3000,
 3500,
 3500,
 3500,
 3500,
 4000,
 4000,
 4000,
 4000,
 4300,
 4300,
 4300,
 5000,
 5000,
 5000,
 5000,
 5000,
 4800,
 4800,
 4800,
 4800,
 4800,
 5100,
 5100,
 5100,
 5500,
 5500,
 5500,
 5500,
 5500,
 5500,
 5500,
 5800,
 5800,
 5800,
 5800,
 5800,
 5800,
 5800,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6400,
 6400,
 6600,
 6600,
 6600,
 6600]
In [49]:
# Visualizing the complete series
print("Deceased cases time series:")
finland_deceas_6
Deceased cases time series:
Out[49]:
[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 3,
 5,
 7,
 9,
 11,
 13,
 17,
 17,
 19,
 20,
 25,
 28,
 27,
 34,
 40,
 42,
 48,
 49,
 56,
 59,
 64,
 72,
 75,
 82,
 90,
 94,
 98,
 141,
 149,
 172,
 177,
 186,
 190,
 193,
 199,
 206,
 211,
 218,
 220,
 230,
 240,
 246,
 252,
 255,
 260,
 265,
 267,
 271,
 275,
 284,
 287,
 293,
 297,
 298,
 300,
 301,
 304,
 306,
 306,
 306,
 307,
 308,
 312,
 313,
 313,
 314,
 316,
 320,
 318,
 320,
 321,
 322,
 322,
 322,
 323,
 323,
 324,
 324,
 325,
 325,
 325,
 326,
 326,
 326,
 326,
 326,
 326,
 326,
 326,
 327,
 327,
 327,
 327,
 328,
 328]
In [50]:
# Calculating the active cases
finland_act_6 = list(np.array(finland_conf_6) - \
                     np.array(finland_recov_6) - \
                     np.array(finland_deceas_6))

finland_act_6
Out[50]:
[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 2,
 5,
 5,
 5,
 5,
 11,
 14,
 14,
 22,
 29,
 39,
 58,
 58,
 154,
 224,
 234,
 267,
 311,
 326,
 390,
 440,
 512,
 615,
 689,
 781,
 867,
 943,
 1024,
 1148,
 1219,
 1329,
 1391,
 1419,
 1199,
 1295,
 1557,
 1599,
 1849,
 1974,
 2147,
 2263,
 2421,
 2556,
 2618,
 2705,
 2797,
 2865,
 1594,
 1707,
 1891,
 1989,
 1770,
 1873,
 1980,
 2112,
 1718,
 1789,
 1886,
 2002,
 1741,
 1900,
 1784,
 1833,
 1956,
 2024,
 1587,
 1666,
 1821,
 1918,
 1478,
 1615,
 1695,
 1713,
 1428,
 1470,
 1558,
 935,
 989,
 1049,
 1080,
 1098,
 1339,
 1387,
 1431,
 1462,
 1472,
 1191,
 1216,
 1279,
 930,
 962,
 1010,
 1039,
 1067,
 1067,
 1090,
 789,
 819,
 842,
 858,
 878,
 901,
 916,
 539,
 548,
 562,
 578,
 582,
 586,
 591,
 593,
 607,
 616,
 617,
 417,
 428,
 240,
 245,
 263,
 270]
In [51]:
# Creating a list of same lenght as days_fin containing the increment of
# the confirmed cases compared to the previous day (first derivate)
# This tells how quickly the confirmed cases are growing
finland_conf_incr_6 = calc_increments(finland_conf_6)

# Visualizing the all series
print("Daily increment in confirmed cases time series:")
finland_conf_incr_6
Daily increment in confirmed cases time series:
Out[51]:
[0.0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 3,
 0,
 0,
 0,
 6,
 3,
 0,
 8,
 7,
 10,
 19,
 0,
 96,
 70,
 19,
 33,
 44,
 15,
 64,
 50,
 73,
 103,
 74,
 92,
 88,
 78,
 83,
 126,
 73,
 112,
 66,
 28,
 72,
 97,
 267,
 45,
 249,
 132,
 179,
 118,
 164,
 136,
 69,
 90,
 97,
 76,
 132,
 120,
 192,
 102,
 85,
 146,
 115,
 155,
 111,
 80,
 101,
 119,
 45,
 166,
 89,
 56,
 125,
 78,
 73,
 85,
 161,
 100,
 65,
 142,
 82,
 22,
 19,
 51,
 91,
 83,
 58,
 61,
 33,
 19,
 44,
 50,
 44,
 31,
 11,
 20,
 29,
 64,
 51,
 33,
 50,
 33,
 26,
 2,
 24,
 0,
 30,
 23,
 17,
 20,
 24,
 15,
 24,
 9,
 14,
 17,
 4,
 4,
 5,
 2,
 14,
 9,
 1,
 1,
 11,
 12,
 5,
 19,
 7]
In [52]:
# Calculating the incremental values of the deceased cases
finland_deceas_incr_6 = calc_increments(finland_deceas_6)
In [53]:
# Extracting all data about Finland (from the first available day)
Finland_0 = extract_country("Finland", "Not applicable", 0)
# Extracting the confirmed cases
finland_conf_0 = Finland_0[0]
# Extracting the recovered cases
finland_recov_0 = Finland_0[1]
# Extracting the decased cases
finland_deceas_0 = Finland_0[2]

# Calculating the incremental values of the confirmed cases
finland_conf_incr_0 = calc_increments(finland_conf_0)
# Calculating the incremental values of the deceased cases
finland_deceas_incr_0 = calc_increments(finland_deceas_0)

# Extracting the dataseries from the first confirmed case in the Country
# by using the function extract_non_null
# (the function extracts all non null values, not only the leading zeros
# but this is OK since the total confirmed cases cannot decrease)

finland_conf_pos = extract_non_null(finland_conf_0)
In [54]:
# Using the function pop_perc to calculate the confirmed cumulative cases
# in percentage of the total population
finland_conf_0_perc = pop_perc(finland_conf_0, finland_pop)
# Doing the same for the deceased cases
finland_deceas_0_perc = pop_perc(finland_deceas_0, finland_pop)

5.4.5. Data from other Scandinavian Countries and Estonia

In [55]:
# Calling the function prep_country_data to extract data related
# to the other Scandinavian Countries

# 1. Skipping the first 6 days of the time series

# Denmark
denmark_6 = prep_country_data("Denmark", "Not applicable", 6)
denmark_conf_6 = denmark_6[0]
denmark_recov_6 = denmark_6[1]
denmark_deceas_6 = denmark_6[2]
denmark_conf_incr_6 = denmark_6[3]

# Norway
norway_6 = prep_country_data("Norway", "Not applicable", 6)
norway_conf_6 = norway_6[0]
norway_recov_6 = norway_6[1]
norway_deceas_6 = norway_6[2]
norway_conf_incr_6 = norway_6[3]

# Sweden
sweden_6 = prep_country_data("Sweden", "Not applicable", 6)
sweden_conf_6 = sweden_6[0]
sweden_recov_6 = sweden_6[1]
sweden_deceas_6 = sweden_6[2]
sweden_conf_incr_6 = sweden_6[3]

# Iceland
iceland_6 = prep_country_data("Iceland", "Not applicable", 6)
iceland_conf_6 = iceland_6[0]
iceland_recov_6 = iceland_6[1]
iceland_deceas_6 = iceland_6[2]
iceland_conf_incr_6 = iceland_6[3]

# 2. complete time series

# Denmark
denmark_0 = prep_country_data("Denmark", "Not applicable", 0)
denmark_conf_0 = denmark_0[0]
denmark_recov_0 = denmark_0[1]
denmark_deceas_0 = denmark_0[2]
denmark_conf_pos = denmark_0[4]
denmark_conf_0_perc = pop_perc(denmark_conf_0, denmark_pop)
denmark_deceas_0_perc = pop_perc(denmark_deceas_0, denmark_pop)

# Norway
norway_0 = prep_country_data("Norway", "Not applicable", 0)
norway_conf_0 = norway_0[0]
norway_recov_0 = norway_0[1]
norway_deceas_0 = norway_0[2]
norway_conf_pos = norway_0[4]
norway_conf_0_perc = pop_perc(norway_conf_0, norway_pop)
norway_deceas_0_perc = pop_perc(norway_deceas_0, norway_pop)

# Sweden
sweden_0 = prep_country_data("Sweden", "Not applicable", 0)
sweden_conf_0 = sweden_0[0]
sweden_recov_0 = sweden_0[1]
sweden_deceas_0 = sweden_0[2]
sweden_conf_pos = sweden_0[4]
sweden_conf_0_perc = pop_perc(sweden_conf_0, sweden_pop)
sweden_deceas_0_perc = pop_perc(sweden_deceas_0, sweden_pop)

# Iceland
iceland_0 = prep_country_data("Iceland", "Not applicable", 0)
iceland_conf_0 = iceland_0[0]
iceland_recov_0 = iceland_0[1]
iceland_deceas_0 = iceland_0[2]
iceland_conf_pos = iceland_0[4]
iceland_conf_0_perc = pop_perc(iceland_conf_0, iceland_pop)
iceland_deceas_0_perc = pop_perc(iceland_deceas_0, iceland_pop)
In [56]:
# Calling the function extract_country to extract data related to Estonia
estonia_0 = prep_country_data("Estonia", "Not applicable", 0)
estonia_conf_0 = estonia_0[0]
estonia_recov_0 = estonia_0[1]
estonia_deceas_0 = estonia_0[2]
estonia_conf_incr_0 = estonia_0[3]
estonia_deceas_incr_0 = calc_increments(estonia_deceas_0)
estonia_act_0 = list(np.array(estonia_conf_0) - \
                     np.array(estonia_recov_0) - \
                     np.array(estonia_deceas_0))
estonia_conf_pos = estonia_0[4]
estonia_conf_0_perc = pop_perc(estonia_conf_0, estonia_pop)
estonia_deceas_0_perc = pop_perc(estonia_deceas_0, estonia_pop)

estonia_6 = prep_country_data("Estonia", "Not applicable", 6)
estonia_conf_6 = estonia_6[0]
estonia_recov_6 = estonia_6[1]
estonia_deceas_6 = estonia_6[2]
estonia_conf_incr_6 = estonia_6[3]

5.4.6. Data from other European Countries

In [57]:
# Calling the function prep_country_data to extract data related to Italy
italy_0 = prep_country_data("Italy", "Not applicable", 0)
italy_conf_0 = italy_0[0]
italy_recov_0 = italy_0[1]
italy_deceas_0 = italy_0[2]
italy_conf_incr_0 = italy_0[3]
italy_deceas_incr_0 = calc_increments(italy_deceas_0)
italy_act_0 = list(np.array(italy_conf_0) - \
                   np.array(italy_recov_0) - \
                   np.array(italy_deceas_0))
italy_conf_pos = italy_0[4]
italy_conf_0_perc = pop_perc(italy_conf_0, italy_pop)
italy_deceas_0_perc = pop_perc(italy_deceas_0, italy_pop)
In [58]:
# Calling the function extract_country to extract data related to Spain
spain_0 = prep_country_data("Spain", "Not applicable", 0)
spain_conf_0 = spain_0[0]
spain_recov_0 = spain_0[1]
spain_deceas_0 = spain_0[2]
spain_conf_incr_0 = spain_0[3]
spain_deceas_incr_0 = calc_increments(spain_deceas_0)
spain_act_0 = list(np.array(spain_conf_0) - \
                   np.array(spain_recov_0) - \
                   np.array(spain_deceas_0))
spain_conf_pos = spain_0[4]
spain_conf_0_perc = pop_perc(spain_conf_0, spain_pop)
spain_deceas_0_perc = pop_perc(spain_deceas_0, spain_pop)
In [59]:
# Calling the function extract_country to extract data related to Germany
germany_0 = prep_country_data("Germany", "Not applicable", 0)
germany_conf_0 = germany_0[0]
germany_recov_0 = germany_0[1]
germany_deceas_0 = germany_0[2]
germany_conf_incr_0 = germany_0[3]
germany_deceas_incr_0 = calc_increments(germany_deceas_0)
germany_act_0 = list(np.array(germany_conf_0) - \
                     np.array(germany_recov_0) - \
                     np.array(germany_deceas_0))
germany_conf_pos = germany_0[4]
germany_conf_0_perc = pop_perc(germany_conf_0, germany_pop)
germany_deceas_0_perc = pop_perc(germany_deceas_0, germany_pop)
In [60]:
# Calling the function extract_country to extract data related to France
france_0 = prep_country_data("France", "Not applicable", 0)
france_conf_0 = france_0[0]
france_recov_0 = france_0[1]
france_deceas_0 = france_0[2]
france_conf_incr_0 = france_0[3]
france_deceas_incr_0 = calc_increments(france_deceas_0)
france_act_0 = list(np.array(france_conf_0) - \
                    np.array(france_recov_0) - \
                    np.array(france_deceas_0))
france_conf_pos = france_0[4]
france_conf_0_perc = pop_perc(france_conf_0, france_pop)
france_deceas_0_perc = pop_perc(france_deceas_0, france_pop)
In [61]:
# Calling the function extract_country to extract data related to Switzerland
switzerland_0 = prep_country_data("Switzerland", "Not applicable", 0)
switzerland_conf_0 = switzerland_0[0]
switzerland_recov_0 = switzerland_0[1]
switzerland_deceas_0 = switzerland_0[2]
switzerland_conf_incr_0 = switzerland_0[3]
switzerland_deceas_incr_0 = calc_increments(switzerland_deceas_0)
switzerland_act_0 = list(np.array(switzerland_conf_0) - \
                         np.array(switzerland_recov_0) - \
                         np.array(switzerland_deceas_0))
switzerland_conf_pos = switzerland_0[4]
switzerland_conf_0_perc = pop_perc(switzerland_conf_0, switzerland_pop)
switzerland_deceas_0_perc = pop_perc(switzerland_deceas_0, switzerland_pop)
In [62]:
# Calling the function extract_country to extract data related to Netherlands
netherlands_0 = prep_country_data("Netherlands", "Not applicable", 0)
netherlands_conf_0 = netherlands_0[0]
netherlands_recov_0 = netherlands_0[1]
netherlands_deceas_0 = netherlands_0[2]
netherlands_conf_incr_0 = netherlands_0[3]
netherlands_deceas_incr_0 = calc_increments(netherlands_deceas_0)
netherlands_act_0 = list(np.array(netherlands_conf_0) - \
                         np.array(netherlands_recov_0) - \
                         np.array(netherlands_deceas_0))
netherlands_conf_pos = netherlands_0[4]
netherlands_conf_0_perc = pop_perc(netherlands_conf_0, netherlands_pop)
netherlands_deceas_0_perc = pop_perc(netherlands_deceas_0, netherlands_pop)
In [63]:
# Calling the function extract_country to extract data related to Austria
austria_0 = prep_country_data("Austria", "Not applicable", 0)
austria_conf_0 = austria_0[0]
austria_recov_0 = austria_0[1]
austria_deceas_0 = austria_0[2]
austria_conf_incr_0 = austria_0[3]
austria_deceas_incr_0 = calc_increments(austria_deceas_0)
austria_act_0 = list(np.array(austria_conf_0) - \
                     np.array(austria_recov_0) - \
                     np.array(austria_deceas_0))
austria_conf_pos = austria_0[4]
austria_conf_0_perc = pop_perc(austria_conf_0, austria_pop)
austria_deceas_0_perc = pop_perc(austria_deceas_0, austria_pop)
In [64]:
# Calling the function extract_country to extract data related to Belgium
belgium_0 = prep_country_data("Belgium", "Not applicable", 0)
belgium_conf_0 = belgium_0[0]
belgium_recov_0 = belgium_0[1]
belgium_deceas_0 = belgium_0[2]
belgium_conf_incr_0 = belgium_0[3]
belgium_deceas_incr_0 = calc_increments(belgium_deceas_0)
belgium_act_0 = list(np.array(belgium_conf_0) - \
                     np.array(belgium_recov_0) - \
                     np.array(belgium_deceas_0))
belgium_conf_pos = belgium_0[4]
belgium_conf_0_perc = pop_perc(belgium_conf_0, belgium_pop)
belgium_deceas_0_perc = pop_perc(belgium_deceas_0, belgium_pop)
In [65]:
# Calling the function extract_country to extract data related to Portugal
portugal_0 = prep_country_data("Portugal", "Not applicable", 0)
portugal_conf_0 = portugal_0[0]
portugal_recov_0 = portugal_0[1]
portugal_deceas_0 = portugal_0[2]
portugal_conf_incr_0 = portugal_0[3]
portugal_deceas_incr_0 = calc_increments(portugal_deceas_0)
portugal_act_0 = list(np.array(portugal_conf_0) - \
                      np.array(portugal_recov_0) - \
                      np.array(portugal_deceas_0))
portugal_conf_pos = portugal_0[4]
portugal_conf_0_perc = pop_perc(portugal_conf_0, portugal_pop)
portugal_deceas_0_perc = pop_perc(portugal_deceas_0, portugal_pop)
In [66]:
# Calling the function extract_country to extract data related to Luxembourg
luxembourg_0 = prep_country_data("Luxembourg", "Not applicable", 0)
luxembourg_conf_0 = luxembourg_0[0]
luxembourg_recov_0 = luxembourg_0[1]
luxembourg_deceas_0 = luxembourg_0[2]
luxembourg_conf_incr_0 = luxembourg_0[3]
luxembourg_deceas_incr_0 = calc_increments(luxembourg_deceas_0)
luxembourg_act_0 = list(np.array(luxembourg_conf_0) - \
                        np.array(luxembourg_recov_0) - \
                        np.array(luxembourg_deceas_0))
luxembourg_conf_pos = luxembourg_0[4]
luxembourg_conf_0_perc = pop_perc(luxembourg_conf_0, luxembourg_pop)
luxembourg_deceas_0_perc = pop_perc(luxembourg_deceas_0, luxembourg_pop)
In [67]:
# Calling the function extract_country to extract data related to Poland
poland_0 = prep_country_data("Poland", "Not applicable", 0)
poland_conf_0 = poland_0[0]
poland_recov_0 = poland_0[1]
poland_deceas_0 = poland_0[2]
poland_conf_incr_0 = poland_0[3]
poland_deceas_incr_0 = calc_increments(poland_deceas_0)
poland_act_0 = list(np.array(poland_conf_0) - \
                    np.array(poland_recov_0) - \
                    np.array(poland_deceas_0))
poland_conf_pos = poland_0[4]
poland_conf_0_perc = pop_perc(poland_conf_0, poland_pop)
poland_deceas_0_perc = pop_perc(poland_deceas_0, poland_pop)
In [68]:
# Calling the function extract_country to extract data related to Ireland
ireland_0 = prep_country_data("Ireland", "Not applicable", 0)
ireland_conf_0 = ireland_0[0]
ireland_recov_0 = ireland_0[1]
ireland_deceas_0 = ireland_0[2]
ireland_conf_incr_0 = ireland_0[3]
ireland_deceas_incr_0 = calc_increments(ireland_deceas_0)
ireland_act_0 = list(np.array(ireland_conf_0) - \
                     np.array(ireland_recov_0) - \
                     np.array(ireland_deceas_0))
ireland_conf_pos = ireland_0[4]
ireland_conf_0_perc = pop_perc(ireland_conf_0, ireland_pop)
ireland_deceas_0_perc = pop_perc(ireland_deceas_0, ireland_pop)

5.4.7. Data from UK and US

In [69]:
# Calling the function prep_country_data to extract data related to UK
uk_0 = prep_country_data("United Kingdom", "Not applicable", 0)
uk_conf_0 = uk_0[0]
uk_recov_0 = uk_0[1]
uk_deceas_0 = uk_0[2]
uk_conf_incr_0 = uk_0[3]
uk_deceas_incr_0 = calc_increments(uk_deceas_0)
uk_act_0 = list(np.array(uk_conf_0) - \
                np.array(uk_recov_0) - \
                np.array(uk_deceas_0))
uk_conf_pos = uk_0[4]
uk_conf_0_perc = pop_perc(uk_conf_0, uk_pop)
uk_deceas_0_perc = pop_perc(uk_deceas_0, uk_pop)
In [70]:
# Calling the function prep_country_data to extract data related to US
us_0 = prep_country_data("US", "Not applicable", 0)
us_conf_0 = us_0[0]
us_recov_0 = us_0[1]
us_deceas_0 = us_0[2]
us_conf_incr_0 = us_0[3]
us_deceas_incr_0 = calc_increments(us_deceas_0)
us_act_0 = list(np.array(us_conf_0) - \
                np.array(us_recov_0) - \
                np.array(us_deceas_0))
us_conf_pos = us_0[4]
us_conf_0_perc = pop_perc(us_conf_0, us_pop)
us_deceas_0_perc = pop_perc(us_deceas_0, us_pop)

5.4.8. Data from Brazil, Russia and India

In [71]:
brazil_0 = prep_country_data("Brazil", "Not applicable", 0)
brazil_conf_0 = brazil_0[0]
brazil_recov_0 = brazil_0[1]
brazil_deceas_0 = brazil_0[2]
brazil_conf_incr_0 = brazil_0[3]
brazil_deceas_incr_0 = calc_increments(brazil_deceas_0)
brazil_act_0 = list(np.array(brazil_conf_0) - \
                    np.array(brazil_recov_0) - \
                    np.array(brazil_deceas_0))
brazil_conf_pos = brazil_0[4]
brazil_conf_0_perc = pop_perc(brazil_conf_0, brazil_pop)
brazil_deceas_0_perc = pop_perc(brazil_deceas_0, brazil_pop)
In [72]:
russia_0 = prep_country_data("Russia", "Not applicable", 0)
russia_conf_0 = russia_0[0]
russia_recov_0 = russia_0[1]
russia_deceas_0 = russia_0[2]
russia_conf_incr_0 = russia_0[3]
russia_deceas_incr_0 = calc_increments(russia_deceas_0)
russia_act_0 = list(np.array(russia_conf_0) - \
                    np.array(russia_recov_0) - \
                    np.array(russia_deceas_0))
russia_conf_pos = russia_0[4]
russia_conf_0_perc = pop_perc(russia_conf_0, russia_pop)
russia_deceas_0_perc = pop_perc(russia_deceas_0, russia_pop)
In [73]:
india_0 = prep_country_data("India", "Not applicable", 0)
india_conf_0 = india_0[0]
india_recov_0 = india_0[1]
india_deceas_0 = india_0[2]
india_conf_incr_0 = india_0[3]
india_deceas_incr_0 = calc_increments(india_deceas_0)
india_act_0 = list(np.array(india_conf_0) - \
                    np.array(india_recov_0) - \
                    np.array(india_deceas_0))
india_conf_pos = india_0[4]
india_conf_0_perc = pop_perc(india_conf_0, india_pop)
india_deceas_0_perc = pop_perc(india_deceas_0, india_pop)

5.4.9. Data from China

In [74]:
# Daily Report from China broken by Provinces
daily_rep_short[daily_rep_short['Country_Region'] == "China"]
Out[74]:
Province_State Country_Region Confirmed Deaths Recovered Active Incidence_Rate Case-Fatality_Ratio
3125 Anhui China 991 6 985 0 1.56705 0.605449
3156 Beijing China 905 9 585 311 4.20149 0.994475
3204 Chongqing China 582 6 573 3 1.87621 1.03093
3236 Fujian China 363 1 357 5 0.921086 0.275482
3241 Gansu China 163 2 140 21 0.618127 1.22699
3256 Guangdong China 1637 8 1622 7 1.4428 0.488699
3257 Guangxi China 254 2 252 0 0.515631 0.787402
3260 Guizhou China 147 2 145 0 0.408333 1.36054
3263 Hainan China 171 6 165 0 1.83084 3.50877
3267 Hebei China 349 6 331 12 0.461885 1.7192
3268 Heilongjiang China 947 13 934 0 2.50994 1.37276
3269 Henan China 1276 22 1254 0 1.32847 1.72414
3275 Hong Kong China 1197 7 1095 95 15.9664 0.584795
3278 Hubei China 68135 4512 63623 0 115.151 6.62215
3280 Hunan China 1019 4 1015 0 1.47703 0.392542
3285 Inner Mongolia China 238 1 236 1 0.939227 0.420168
3297 Jiangsu China 654 0 653 1 0.812321 0
3298 Jiangxi China 932 1 931 0 2.00516 0.107296
3299 Jilin China 155 2 153 0 0.573225 1.29032
3344 Liaoning China 154 2 147 5 0.353292 1.2987
3354 Macau China 46 0 45 1 7.08409 0
3408 Ningxia China 75 0 75 0 1.09012 0
3463 Qinghai China 18 0 18 0 0.298507 0
3506 Shaanxi China 320 3 307 10 0.828157 0.9375
3507 Shandong China 792 7 785 0 0.788295 0.883838
3508 Shanghai China 707 7 675 25 2.91667 0.990099
3509 Shanxi China 198 0 198 0 0.532544 0
3513 Sichuan China 589 3 575 11 0.70615 0.509338
3541 Tianjin China 198 3 193 2 1.26923 1.51515
3542 Tibet China 1 0 1 0 0.0290698 0
3599 Xinjiang China 76 3 73 0 0.305589 3.94737
3607 Yunnan China 185 2 183 0 0.383023 1.08108
3612 Zhejiang China 1269 1 1267 1 2.21196 0.0788022
In [75]:
print("Number of entries related to China:")
len(daily_rep_short[daily_rep_short['Country_Region'] == "China"])
Number of entries related to China:
Out[75]:
33
In [76]:
# Extracting data related to Hubei province by screning out the text variables
# and putting the result in list format
hubei_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
                                (world_conf_short['Province/State'] == 'Hubei')]
hubei_conf_0 = hubei_conf_0.iloc[:, 2:].values.tolist()[0]
hubei_conf_incr_0 = calc_increments(hubei_conf_0)
hubei_conf_0_perc = pop_perc(hubei_conf_0, hubei_pop)

hubei_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
                                  (world_recov_short['Province/State'] == 'Hubei')]
hubei_recov_0 = hubei_recov_0.iloc[:, 2:].values.tolist()[0]

hubei_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
                                    (world_deceas_short['Province/State'] == 'Hubei')]
hubei_deceas_0 = hubei_deceas_0.iloc[:, 2:].values.tolist()[0]

hubei_act_0 = list(np.array(hubei_conf_0) - \
                   np.array(hubei_recov_0) - \
                   np.array(hubei_deceas_0))

# Extracting data related to all the other provinces, making the sum
# and putting the result in list format
restchina_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
                                    (world_conf_short['Province/State'] !=  'Hubei')]
restchina_conf_0 = restchina_conf_0.groupby(['Country/Region']).sum()
restchina_conf_0 = restchina_conf_0.values.tolist()[0]
restchina_conf_incr_0 = calc_increments(restchina_conf_0)
restchina_conf_0_perc = pop_perc(restchina_conf_0, restchina_pop)

restchina_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
                                      (world_recov_short['Province/State'] !=  'Hubei')]
restchina_recov_0 = restchina_recov_0.groupby(['Country/Region']).sum()
restchina_recov_0 = restchina_recov_0.values.tolist()[0]

restchina_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
                                        (world_deceas_short['Province/State'] !=  'Hubei')]
restchina_deceas_0 = restchina_deceas_0.groupby(['Country/Region']).sum()
restchina_deceas_0 = restchina_deceas_0.values.tolist()[0]

restchina_act_0 = list(np.array(restchina_conf_0) - \
                       np.array(restchina_recov_0) - \
                       np.array(restchina_deceas_0))

5.5. Summary of the Created Datasets

Within this document, different datasets are used for different purposes. This section provides a summary as a useful reference and describes the naming rules that have been used. Those variables that have been created temporarily just for reason of code clarity are not included in this list.

world_conf_clean

  • Dataframe based on world_confirmed (ime_series_covid19_confirmed_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

world_recov_clean

  • Dataframe based on world_recovered (ime_series_covid19_recovered_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

world_deceas_clean

  • Dataframe based on world_deceased (ime_series_covid19_deaths_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

daily_rep_clean

  • Dataframe based on daily_report (mm-dd-yyyy.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"



world_conf_short, world_recov_short, world_deceas_short

  • Dataframe based on world_conf_clean, world_recov_clean, world_deceas_clean
  • GPS coordinates have been dropped

world_conf, world_recov, world_deceas

  • Dataframe based on world_conf_short, world_recov_short, world_deceas_short
  • Only columns with daily data have been selected

world_conf_tot, world_recov_tot, world_deceas_tot

  • Dataframe based on world_conf, world_recov, world_deceas
  • The overall worldwide daily sum has been calculated

world_act_tot

  • List based on world_conf_x, world_recov_x, world_deceas_x (the second and third are subtracted from the first) containing the active cases

world_conf_incr

  • Dataframe based on world_conf_tot containing the daily increments

world_deceas_incr

  • Dataframe based on world_deceas_tot containing the daily increments

daily_rep_short

  • Dataframe based on daily_rep_clean
  • All columns not containing cases counts have been dropped



world_conf_group, world_recov_group, world_deceas_group

  • Dataframe based on world_conf_short, world_recov_short, world_deceas_short
  • Data grouped by Country/Region

daily_rep_group

  • Dataframe based on daily_rep_short
  • Data grouped by Country/Region



days_tot

  • List obtained by using world_confirmed which contains the dates of all the days in m/d format

days_fin

  • List based on days_tot where the first 6 days have been dropped



country_conf_x, country_recov_x, country_deceas_x

  • where country is the Country written with small letters
  • where x is the number of days to skip in the time series starting from the first one
  • Lists obtained by using world_confirmed, world_recovered, world_deceased
  • Data related to Country has been extracted
  • Data related to the first x days has been dropped

country_act_x

  • List based on country_conf_x, country_recov_x, country_deceas_x (the second and third are subtracted from the first) containing the active cases

country_conf_incr_x

  • List based on country_conf_x containing the daily increments

country_deceas_incr_x

  • List based on country_deceas_x containing the daily increments

country_conf_0_perc

  • List based on country_conf_0
  • It containing the confirmed cumulative cases in percentage of the total population

country_deceas_0_perc

  • List based on country_deceas_0
  • It containing the deceased cumulative cases in percentage of the total population

country_conf_pos

  • where country is the Country written with small letters
  • Lists based on country_conf_0
  • Data related to days with zero cumulative cases in the Country has been dropped

6. Domain-Specific Concepts

The basic reproductive number, R0 is the average number of secondary infections generated by one infectious individual. When R0 > 1 the infection is able to spread. The aim of the non-pharmaceutical interventions (NPIs), as social distancing, is to reduce the value of R0.

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-transmissibility-25-01-2020.pdf

The Case Fatality Ratio (CFR) is the proportion of detected cases of a given disease that die as a result of it.

Surveillance is typically biased towards detecting clinically severe cases, particularly at the start of an epidemic when diagnostic capacity is limited. This leads to an over estimation of the CFR.

On the other hand, there is a time interval (2/3 weeks) between the onset of symptoms and death or recovery. Therefore, measuring the simple ratio deceased/infected during a growing epidemic does not allow to observe the outcome of all the infected cases, leading to a under estimation of the CFR.

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-severity-10-02-2020.pdf

NOTE: The Infection Fatality Rate is the percentage of people that get the infection and then die. This number is much harder to estimate compared to the CFR since we do not know the total amount of people that have been really infected in a certain area.

7. Data Visualization

7.1. Overview

7.1.1. General Comments to the Plots

The following curves are shown in the plots contained in this section:

  • Cumulative confirmed cases
  • Cumulative recovered cases
  • Cumulative deceased cases
  • Cumulative active cases
  • Daily increments in the confirmed cases
  • Daily increments in the recovered cases
  • Daily increments in the deceased cases
  • Daily increments in the active cases

The first four curves show the cumulative cases in a certain region since the start of the epidemic.

The cumulative confirmed cases curve is expected to grow exponentially and then slowly smoothing out towards a horizontal shape. Government decisions and people behavior can affect the way this curve looks like. The aim is to keep the curve not too steep in order not to saturate the capacity of the hospitals in the Country. However, it should be noted that the effects of Government and people actions are not immediate due to the incubation period.

The cumulative recovered cases curve follows the cumulative confirmed cases with a certain delay in time and a lower y value due to the amount of deceased cases.

The cumulative active cases are given by the confirmed cases minus the recovered cases minus the deceased cases. It is the only one of the cumulative cases curves that can decrease over time and this happens when the number of confirmed cases grows slower than the combined number of recovered and deceased cases. This curve is expected to have an (upside down) bell shape.

The new confirmed daily cases show the speed at which the virus is spreading. This curve is expected to have an (upside down) bell shape. This curve shows the daily values and therefore is shows also some noise. Some of this noise might be due to mistakes in reporting the daily data (sometimes data of a certain day is reported together with the next day data). This kind of mistake does not affect the grand total and affects only very little the trend of the curves.

The new recovered daily cases curve looks similar to the new confirmed daily cases curve with a delay in time and lower y values.

The incremental daily active cases curve shows two picks of opposite sign. The x value where the negative curve starts corresponds to the pick of the corresponding cumulative curve.

NOTE: The number of the actual confirmed cases is very likely above the number of the counted confirmed cases since not all population is tested and there might be many infected persons showing no symptoms. However, by assuming a constant testing policy during the all observation period, the rate of changes is unaffected by systematic under-reporting and therefore there is a lot of useful information that can be obtained by those curves.


"The only real data we have is from the flights used by a number of Countries to repatriate their citizens. The all population was tested on those planes. If the population samples given by the passengers of those flights would be representative of the all population, we could conclude that the epidemic is at least 3 times larger compared to what the collected data shows."

Feb 12th, Prof. Neil Ferguson, https://www.imperial.ac.uk/people/neil.ferguson


"By comparing the number of flights that came into a certain Country from the worst affected area in China (Wuhan City) with the cases detected in that Country, it can be bound that the number of cases per flight varies quite a lot depending on the Country.

Singapore had a relatively high number of cases compared to other Countries. By using that data as a benchmark, that is, by assuming the Singapore has detected all the cases, the result is that worldwide approximately 2/3 of the cases have not been detected."

Professor Christl Donnelly, https://www.imperial.ac.uk/people/c.donnelly


More recent serological tests show that the number of actual cases might be up to 10/20 times the number of counted confirmed cases.

7.1.2. A Reference Curve Set

The first complete curves are related to China. Let's analyze the curves related to China either than Hubei province. The curves can be divided in 4 phases which are named here after the shape of the cumulative confirmed cases curve.

1) Exponential increase phase

  • In the first phase the number of cumulative confirmed cases grows exponentially (it grows and it grows faster each day) while the number of recovered and deceased cases is still null (the number of cumulative confirmed cases corresponds to the number of active cases, which corresponds to the number of the "Infectious" in the popular epidemiologic SIR model (')). The increments in the number of confirmed cases shows the left side of a bell shape. The same happens for the incremental active cases.

2) Linear increase phase

  • In the second phase the number of confirmed cases grows at a quite constant speed (the cumulative cases grow in a straight line and the increment curve starts to flatten). In the middle of this phase we see the pick in the number of incremental confirmed cases (R0 has decreased to 1). There is also a pick on the incremental active cases. In this phase we see a quite modest increase in recovered and deceased cases and we start to see that the cumulative active cases curve and the cumulative confirmed cases curve take their own path.

3) Slowed-down increase phase

  • In the third phase the number of confirmed cases grows at a slower and slower speed (the cumulative curve starts to flatten towards a horizontal shape and the incremental confirmed cases curve shows the right side of the bell). R0 starts to decrease below 1. In this phase the number of cumulative recovered and deceased cases keeps growing and the number of cumulative active cases reaches a pick and then starts to decrease. The pick on the cumulative active cases is known as Herd Immunity. In the incremental active cases curve this is seen as the point when the curve changes sign. There is a lag in time between the pick in new confirmed cases seen in phase 2 and the pick in active cases seen in this phase.

4) No increase phase

  • In the fourth phase the number of cumulative confirmed cases remains constant and consequently the corresponding incremental curve is zero (R0 is almost 0). The number of recovered and deceased cases keeps growing and the active cases decrease down towards zero.

Note that a new wave might follow (as it might happen in China outside Hubei).

Note that should the testing policy change during the observation period, the curve might look different.

Whenever containment measures have been adopted in a certain area, the earliest moment in time when it makes sense to start to release them gradually is after the Herd Immunity pick. However, in this case the Herd Immunity has been obtained under certain conditions (the containment measures) and therefore, as soon as those conditions are released, the Heard Immunity is no longer valid. Release of containment measures might cause the curves to differ from this example and might lead to new picks before the active cases curves goes to zero.

(') https://medium.com/data-for-science/epidemic-modeling-101-or-why-your-covid19-exponential-fits-are-wrong-97aa50c55f8

In [77]:
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, restchina_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in China "\
                     "either than Hubei over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               first_line_x='1/30', first_line_col=7, first_line_ls=':',
               first_line_x_l='End of the exponential increase phase',
               second_line_x='2/5', second_line_col=7, second_line_ls='--',
               second_line_x_l='End of the linear increase phase',
               third_line_x='2/22', third_line_col=7, third_line_ls='-.',
               third_line_x_l='End of the slowed-down increase phase',
               fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
               fourth_line_x_l='End of the no increase phase',
               fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
               fifth_line_x_l='Herd Immunity')
In [78]:
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, "New daily confirmed cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in China "\
                     "either than Hubei",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               first_line_x='1/30', first_line_col=7, first_line_ls=':',
               first_line_x_l='End of the exponential increase phase',
               second_line_x='2/5', second_line_col=7, second_line_ls='--',
               second_line_x_l='End of the linear increase phase',
               third_line_x='2/22', third_line_col=7, third_line_ls='-.',
               third_line_x_l='End of the slowed-down increase phase',
               fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
               fourth_line_x_l='End of the no increase phase',
               fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
               fifth_line_x_l='Herd Immunity')
In [79]:
# Plotting new daily deceased cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_deceas_0), 3, 
               "Daily deceased cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily (reported) deceased cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=12, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')
In [80]:
# Plotting new daily recovered cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_recov_0), 2, 
               "Daily recovered cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily (reported) recovered cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=12, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')
In [81]:
# Plotting daily increments in the active cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_act_0), 1, 
               "Daily increments in the active cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 increments in the daily active cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=12, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')

7.2. Finnish Internal Situation

Unfortunately, the Finnish Institute for Health and Welfare (THL) does not publish reliable daily data about the recovered cases and therefore it is not possible to draw an accurate curve for the active cases.

Notes:

The increased speed in the confirmed cases on 4/4 is due to change in testing policy.

The confirmed cases data from 3/12 has been reported on 3/13.

Obviously, there is something wrong in the source data since it shows that the cumulative deaths on 1/6 are smaller than the cumulative deaths of 31/5, hence the negative value for the increment in deceased cases on 1/6. The same applies to 4/6.

In [82]:
print("Error data in deceased cases in Finland:")
find_error_days(finland_deceas_0)
Error data in deceased cases in Finland:
['4/6', '6/1']
In [83]:
# Plotting daily cumulative cases in Finland
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "confirmed cases"),
               #(days_fin, finland_recov_6, ".", '-', 2, "recovered cases"),
               (days_fin, finland_deceas_6, ".", '-', 3, "deceased cases"),
               #(days_fin, finland_act_6, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Finland over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0,
               first_line_x='3/12', first_line_col=6,
               first_line_ls=':', first_line_x_l='First actions',
               second_line_x='3/16', second_line_col=6,
               second_line_ls='--', second_line_x_l='State of emergency declared',
               third_line_x='3/28', third_line_col=6,
               third_line_ls='-.', third_line_x_l='Additional actions',
               fourth_line_x='4/11', fourth_line_col=6,
               fourth_line_ls='-', fourth_line_x_l='Tighter border control',
               fifth_line_x='4/15', fifth_line_col=8,
               fifth_line_ls='-.', fifth_line_x_l='Uusima border opened',
               sixth_line_x='5/14', sixth_line_col=8,
               sixth_line_ls='--', sixth_line_x_l='More releasing misures',
               seventh_line_x='6/1', seventh_line_col=8,
               seventh_line_ls=':', seventh_line_x_l='Further releasing',
               eighth_line_x='6/15', eighth_line_col=8,
               eighth_line_ls='-', eighth_line_x_l='End of state of emergency')
In [84]:
print("Concrete actions by the Finnish government")
measures.style.set_properties(**{'text-align': 'left'}).\
set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ]).hide_index()
Concrete actions by the Finnish government
Out[84]:
Date Actions
12.3. First containment measures: gathering of more than 500 people banned
16.3. State of emergency declared: closing shools, universities, museums, theatres, libraries, sport facilities; gathering of more than 10 people banned
28.3. Additional containment measures: Uusimaa region borders closed, restaurant dining forbidden
11.4. Additional containment measures: No passengers in ships from Germany, Sweden, Estonia
15.4. First releasing measures: Uusima border re-opened
14.5. More releasing misures: schools opening, business travell allowed within Schengen
1.6. Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, reopening of museums and theatres
15.6. End of state of emergency
In [85]:
# Plotting new daily confirmed Coronavirus cases in Finland
cust_bar_plot((days_fin, finland_conf_incr_6, 0, "New daily confirmed cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in Finland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0,
               first_line_x='3/12', first_line_col=6,
               first_line_ls=':', first_line_x_l='First actions',
               second_line_x='3/16', second_line_col=6,
               second_line_ls='--', second_line_x_l='State of emergency declared',
               third_line_x='3/28', third_line_col=6,
               third_line_ls='-.', third_line_x_l='Additional actions',
               fourth_line_x='4/11', fourth_line_col=6,
               fourth_line_ls='-', fourth_line_x_l='Tighter border control',
               fifth_line_x='4/15', fifth_line_col=8,
               fifth_line_ls='-.', fifth_line_x_l='Uusima border opened',
               sixth_line_x='5/14', sixth_line_col=8,
               sixth_line_ls='--', sixth_line_x_l='More releasing misures',
               seventh_line_x='6/1', seventh_line_col=8,
               seventh_line_ls=':', seventh_line_x_l='Further releasing',
               eighth_line_x='6/15', eighth_line_col=8,
               eighth_line_ls='-', eighth_line_x_l='End of state of emergency')
In [86]:
# Plotting new daily deceased cases in Finland
cust_bar_plot((days_fin, finland_deceas_incr_6, 3, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily deceased cases in Finland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.3. Comparison with the Closest Neighboring Countries

Sweden and Russia have a much higher number of cumulative confirmed cases per capita compared to Finland.

The corresponding figure considering the all world is similar to Finland but it is on an increasing path whereas Finnish curve tends to remain constant.

Also, Sweden has currently a much higher number of cumulative deaths per capita compared to Finland. Therefore, at least for Sweden, it is unlikely that the comparison is biased by a different testing policy.

In [87]:
# Comparing Finnish per capita cumulative confirmed cases with Sweden and Russia
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, sweden_conf_0_perc, ".", '-', 1, "Sweden"),
               (days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
               (days_tot, russia_conf_0_perc, ".", '-', 3, "Russia"),
               (days_tot, estonia_conf_0_perc, ".", '-', 7, "Estonia"),
               (days_tot, world_conf_perc, ".", '-', 4, "World Total"),
               figsize_w=18, figsize_h=12,
               title="COVID-19 per capita cumulative confirmed cases "\
                     "in Finland compared to the closest neighboring countries",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [88]:
# Comparing Finnish per capita cumulative deceased cases with Sweden and Russia
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, sweden_deceas_0_perc, ".", '-', 1, "Sweden"),
               (days_tot, norway_deceas_0_perc, ".", '-', 6, "Norway"),
               (days_tot, russia_deceas_0_perc, ".", '-', 3, "Russia"),
               (days_tot, estonia_deceas_0_perc, ".", '-', 7, "Estonia"),
               (days_tot, world_deceas_perc, ".", '-', 4, "World Total"),
               figsize_w=18, figsize_h=12,
               title="COVID-19 per capita cumulative deceased cases "\
                     "in Finland compared to the closest neighboring countries",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.3.1. Comparison with Other Scandinavian Countries and Estonia

Description of the plots of this section

It appears that the Finnish curve is quite smooth compared to the other curves. Only Iceland and Estonia have a smoother curve. This would suggest that the virus is not spreading faster in Finland compared to most of the other Scandinavian Countries. By shifting all the curves so that they start for each Country in the day of the first confirmed case, the Finnish curve is the slowest to grow but then crosses the curves of Iceland and Estonia.

Even though the virus started later in Finland, the first recovered case happened much earlier than other Scandinavian Countries.

Finland has the lowest number of deceased cases after Norway and Estonia (Sweden has the highest).

The high numbers for Sweden do not surprise due to the quite relaxed containment policy in the Country.

NOTE: It should be noted that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.

NOTE: The data from Denmark does not include Faroe Islands and Greenland.

In [89]:
# Comparing cumulative confirmed cases over time in Scandinavia plus Estonia
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "Finland"),
               (days_fin, denmark_conf_6, ".", '-', 3, "Denmark"),
               (days_fin, norway_conf_6, ".", '-', 6, "Norway"),
               (days_fin, sweden_conf_6, ".", '-', 8, "Sweden"),
               (days_fin, iceland_conf_6, ".", '-', 4, "Iceland"),
               (days_fin, estonia_conf_6, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases in "\
                     "Scandinavia and Estonia over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [90]:
# Comparing cumulative confirmed cases over time in Scandinavia plus Estonia
# starting form the day of the first confirmed case in Finland
cust_line_plot((list(range(len(finland_conf_pos))), finland_conf_pos,
                ".", '-', 0, "Finland"),
               (list(range(len(denmark_conf_pos))), denmark_conf_pos,
                ".", '-', 3, "Denmark"),
               (list(range(len(norway_conf_pos))), norway_conf_pos,
                ".", '-', 6, "Norway"),
               (list(range(len(sweden_conf_pos))), sweden_conf_pos,
                ".", '-', 8, "Sweden"),
               (list(range(len(iceland_conf_pos))), iceland_conf_pos,
                ".", '-', 4, "Iceland"),
               (list(range(len(estonia_conf_pos))), estonia_conf_pos,
                ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Scandinavia and Estonia over time",
               title_fs=18, title_offset=20,
               rem_borders=True, 
               label_fs=12, tick_fs=12, 
               x_label="Days since the first confirmed case in the Country",
               rot=0,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [91]:
# Comparing new daily confirmed Coronavirus cases in Scandinavia plus Estonia
cust_line_plot((days_fin, finland_conf_incr_6, ".", '-', 0, "Finland"),
               (days_fin, denmark_conf_incr_6, ".", '-', 3, "Denmark"),
               (days_fin, norway_conf_incr_6, ".", '-', 6, "Norway"),
               (days_fin, sweden_conf_incr_6, ".", '-', 8, "Sweden"),
               (days_fin, iceland_conf_incr_6, ".", '-', 4, "Iceland"),
               (days_fin, estonia_conf_incr_6, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in Scandinavia and Estonia",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

Comments to the next two plots:

Data related to Iceland is corrupted (cumulative data cannot decrease) so the related plot is not shown.

In [92]:
# Comparing cumulative recovered cases over time in Scandinavia plus Estonia
plot_stacked_bar(days_fin,
                 [finland_recov_6, denmark_recov_6, norway_recov_6, sweden_recov_6, estonia_recov_6],
                 ["Finland", "Denmark", "Norway", "Sweden", "Estonia"],
                 col=[0, 3, 6, 8, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="Coronavirus COVID-19 cumulative (reported) recovered cases in "\
                       "Scandinavia and Estonia over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=12, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [93]:
# Comparing cumulative deceased cases over time in Scandinavia plus Estonia
plot_stacked_bar(days_fin,
                 [finland_deceas_6, denmark_deceas_6, norway_deceas_6, sweden_deceas_6, estonia_deceas_6],
                 ["Finland", "Denmark", "Norway", "Sweden", "Estonia"],
                 col=[0, 3, 6, 8, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="Coronavirus COVID-19 cumulative (reported) deceased cases in "\
                       "Scandinavia and Estonia over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=12, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)

7.4. Comparison with other European Countries

Finland has also the lowest curves compared to other European Countries (except for Luxemburg). However, it shall be noted that those are absolute values which are not normalized by taking into consideration the Country population.

The plots related to the new confirmed cases show the same pattern for all those Countries (except for Poland). This might be due to the fact that those plots are very much dependent on how many people are tested in a certain day.

Switzerland has managed to keep a relatively low curve. France has experienced a noticeable increase in the recorded confirmed cases around 4/11. Germany has managed to keep a low curve of the deceased cases despite the relatively high curve of the confirmed cases.

NOTE: When comparing those curves please note also that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.

NOTE: The data from France and Netherlands does not include offshore territories.

NOTE: Obviously, the following data is wrong since the cumulative data cannot decrease (leading to a negative daily increment):

In [94]:
print("Error data in confirmed cases in Spain:")
find_error_days(spain_conf_0)
print("Error data in confirmed cases in France:")
find_error_days(france_conf_0)
print("Error data in confirmed cases in Portugal:")
find_error_days(portugal_conf_0)
print("Error data in deceased cases in Spain:")
find_error_days(spain_deceas_0)
print("Error data in deceased cases in France:")
find_error_days(france_deceas_0)
Error data in confirmed cases in Spain:
['4/24', '5/25']
Error data in confirmed cases in France:
['4/18', '4/22', '4/29', '5/13', '5/16', '5/24', '5/26', '6/2', '6/4', '6/16', '6/24', '6/25', '6/27']
Error data in confirmed cases in Portugal:
['5/2']
Error data in deceased cases in Spain:
['5/25']
Error data in deceased cases in France:
['5/16', '5/19', '6/27']
In [95]:
# Comparing cumulative confirmed cases over time for Finland,
# Italy, Spain, Germany, France, Switzerland, Belgium, Netherlands and Portugal
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0, ".", '-', 3, "France"),              
               (days_tot, switzerland_conf_0, ".", '-', 6, "Switzerland"),
               (days_tot, belgium_conf_0, ".", '-', 8, "Belgium"),
               (days_tot, netherlands_conf_0, ".", '-', 7, "Netherlands"),
               (days_tot, portugal_conf_0, ".", '-', 5, "Portugal"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland compared to \nItaly, Spain, Germany, France "\
                     "Switzerland, Belgium, Netherlands and Portugal",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [96]:
# Comparing cumulative deceased cases over time for Finland,
# Italy, Spain, Germany, France, Switzerland, Belgium, Netherlands and Portugal
cust_line_plot((days_tot, finland_deceas_0, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0, ".", '-', 3, "France"),              
               (days_tot, switzerland_deceas_0, ".", '-', 6, "Switzerland"),
               (days_tot, belgium_deceas_0, ".", '-', 8, "Belgium"),
               (days_tot, netherlands_deceas_0, ".", '-', 7, "Netherlands"),
               (days_tot, portugal_deceas_0, ".", '-', 5, "Portugal"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland compared to \nItaly, Spain, Germany, France "\
                     "Switzerland, Belgium and Netherlands and Portugal",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [97]:
# Comparing cumulative confirmed cases over time for Finland,
# Austria, Luxembourg, Ireland and Poland
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),               
               (days_tot, austria_conf_0, ".", '-', 3, "Austria"),
               (days_tot, luxembourg_conf_0, ".", '-', 9, "Luxembourg"),
               (days_tot, ireland_conf_0, ".", '-', 1, "Ireland"),
               (days_tot, poland_conf_0, ".", '-', 6, "Poland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland compared to \nAustria, "\
                     "Luxembourg, Ireland and Poland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [98]:
# Comparing cumulative deceased cases over time for Finland,
# Austria, Luxembourg, Ireland and Poland
cust_line_plot((days_tot, finland_deceas_0, ".", '-', 0, "Finland"),               
               (days_tot, austria_deceas_0, ".", '-', 3, "Austria"),
               (days_tot, luxembourg_deceas_0, ".", '-', 9, "Luxembourg"),
               (days_tot, ireland_deceas_0, ".", '-', 1, "Ireland"),
               (days_tot, poland_deceas_0, ".", '-', 6, "Poland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland compared to \nAustria, "\
                     "Luxembourg, Ireland and Poland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.5. UK and US

UK and US have followed quite relaxed policies in containing the spread of the virus during the first days.

NOTE: The data from UK does not include the Isle of Man and the Channel Islands.

NOTE: The following data is wrong since the cumulative data cannot decrease (leading to a negative daily increment):

In [99]:
print("Error data in confirmed cases in UK:")
find_error_days(uk_conf_0)
Error data in confirmed cases in UK:
['5/20']
In [100]:
# Comparing cumulative confirmed Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_conf_0, ".", '-', 0, "Finland"),
               (days_tot, uk_conf_0, ".", '-', 4, "UK"),
               (days_tot, us_conf_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [101]:
# Comparing cumulative deceased Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_deceas_0, ".", '-', 0, "Finland"),
               (days_tot, uk_deceas_0, ".", '-', 4, "UK"),
               (days_tot, us_deceas_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [102]:
# Comparing new daily confirmed cases Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_conf_incr_0, ".", '-', 0, "Finland"),
               (days_tot, uk_conf_incr_0, ".", '-', 4, "UK"),
               (days_tot, us_conf_incr_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.6. Brazil, Russia and India

Whereas in the first half of the year the virus has hit mostly China, Europe and the US, by the end of winter the number of active cases in China was down to very low values and the same has happened in most of Europe by the end of spring.

However, in other parts of the world, like Russia, India and Brazil the curves are still in a growing phase at the beginning of summer.

The following chart shows the cumulative confirmed cases in those 3 Countries. For reason of scale, Finnish curve would not bi visible in the same chart so Italy curve has been added to show a comparison with the number of cases over time in one of the most hit Countries in Europe.

The second chart shows a similar plot for the deceased cases (which are less sensible to the Country specific testing policy).

While Brazil and Russia have entered the linear growing phase, India is still in the exponential growing phase.

In [103]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy
cust_line_plot((days_tot, brazil_conf_0, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0, ".", '-', 1, "India"),
               (days_tot, italy_conf_0, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [104]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy
cust_line_plot((days_tot, brazil_deceas_0, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.7. Normalizing by Country population

7.7.1. List of Variables Affecting Potentially the Curves

The curves related to the cumulative confirmed cases seem to have similar shape. The main difference seems to be the height.

The height of those curves can differ for different reasons, including:

  • the Country overall population (obviously the more people are in the Country, the more people can get infected)
  • the population density (higher is the population density, easier it might be for the virus to spread)
  • demographics (older is the population, easier is for the virus to kill)
  • average health conditions of the population (healthier is the population, harder is for the virus to kill)
  • genetics ?
  • climate (the virus might have more difficulty to survive in too cold or too hot weather)
  • pollution (there are preliminary indications that pollution might facilitate the spread of the virus)
  • possible mutations of the virus in that area
  • the testing policy in the Country (the more people a Country tests, the more infected cases might be discovered)
  • which containment measures have been taken by Authorities and how early
  • how well the population has followed the containment measures
  • whether the Country is in a central area and whether there is a lot of movement of people
  • last but not least, the stage in which the Country is (the curves follow all the same smooth-steep-smooth shape so Countries where the virus has just started to spread show lower curves)

It might be interesting to isolate the first variable, Country population, by dividing the values by the Country population in order to calculate the amount of cases per capita. The result is shown in plots in this section.

NOTE: The Country population figures are approximative.

7.7.2. Confirmed Cases: Summary of Findings from the Analysis

The plots show that the other variables still can affect the curve as much as 10 times..

When comparing Scandinavian Countries and Estonia, Finland has the lowest number of confirmed cases per capita. Iceland has the highest number.

Among the analyzed European Countries, Luxemburg has the highest confirmed cases curve, followed by Spain and Belgium (which have values that are comparable with Iceland). Poland is the only Country among those ones that have been analyzed, that has a confirmed cases curve lower than Finland.

Note that one of the reasons why UK and Finland curves started pretty low might be due to the fact that they are quite isolated geographically and therefore the virus started to spread later.

However, those curves clearly show that in Countries that have not taken prompt containment actions, such as UK, US and Sweden, those curves started to take a steeper shape.

In [105]:
# Comparing cumulative confirmed cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, denmark_conf_0_perc, ".", '-', 3, "Denmark"),
               (days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
               (days_tot, sweden_conf_0_perc, ".", '-', 8, "Sweden"),
               (days_tot, iceland_conf_0_perc, ".", '-', 4, "Iceland"),
               (days_tot, estonia_conf_0_perc, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other Scandinavian Countries plus Estonia \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [106]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0_perc, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0_perc, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0_perc, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0_perc, ".", '-', 3, "France"),
               (days_tot, switzerland_conf_0_perc, ".", '-', 6, "Switzerland"),
               (days_tot, luxembourg_conf_0_perc, ".", '-', 9, "Luxembourg"),
               (days_tot, belgium_conf_0_perc, ".", '-', 8, "Belgium"),
               (days_tot, ireland_conf_0_perc, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [107]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, netherlands_conf_0_perc, ".", '-', 4, "Netherlands"),
               (days_tot, austria_conf_0_perc, ".", '-', 3, "Austria"),
               (days_tot, portugal_conf_0_perc, ".", '-', 2, "Portugal"),
               (days_tot, poland_conf_0_perc, ".", '-', 7, "Poland"),
               (days_tot, uk_conf_0_perc, ".", '-', 6, "UK"),
               (days_tot, us_conf_0_perc, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [108]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
cust_line_plot((days_tot, brazil_conf_0_perc, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0_perc, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0_perc, ".", '-', 1, "India"),
               (days_tot, italy_conf_0_perc, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.7.3. Deceased Cases: Summary of Findings from the Analysis

In the attempt to eliminate the variability due to different testing policies in different Countries, similar plots have been created by taking the deceased cases rather than the cumulative confirmed cases as a reference curve.

Finland has the second lowest curve in Scandinavia, after Norway, and the third lowest if also Estonia is counted. Sweden has the higherst curve. (This is the same result that has been obtained before normalization).

Among the analyzed EU Countries, Belgium is the Country with the highest deceased cases curve, followed by Spain and Italy.

Poland has a deceased cases curve lower than Finland.

In [109]:
# Comparing cumulative deceased cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, denmark_deceas_0_perc, ".", '-', 3, "Denmark"),
               (days_tot, norway_deceas_0_perc, ".", '-', 6, "Norway"),
               (days_tot, sweden_deceas_0_perc, ".", '-', 8, "Sweden"),
               (days_tot, estonia_deceas_0_perc, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other Scandinavian Countries plus Estonia\n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [110]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0_perc, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0_perc, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0_perc, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0_perc, ".", '-', 3, "France"),
               (days_tot, switzerland_deceas_0_perc, ".", '-', 6, "Switzerland"),
               (days_tot, luxembourg_deceas_0_perc, ".", '-', 9, "Luxembourg"),
               (days_tot, belgium_deceas_0_perc, ".", '-', 8, "Belgium"),
               (days_tot, ireland_deceas_0_perc, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [111]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, netherlands_deceas_0_perc, ".", '-', 4, "Netherlands"),
               (days_tot, austria_deceas_0_perc, ".", '-', 3, "Austria"),
               (days_tot, portugal_deceas_0_perc, ".", '-', 2, "Portugal"),
               (days_tot, poland_deceas_0_perc, ".", '-', 7, "Poland"),
               (days_tot, uk_deceas_0_perc, ".", '-', 6, "UK"),
               (days_tot, us_deceas_0_perc, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [112]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
cust_line_plot((days_tot, brazil_deceas_0_perc, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0_perc, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0_perc, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0_perc, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.8. Demographic Considerations

In [113]:
print("The range of the median age in the EU Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(median_age_range))
The range of the median age in the EU Countries that are analyzed here is: 9.0 years

7.9. Normalizing by Country Population and Population Density

In this section the cumulative confirmed cases are divided by the population and by the population density, in the attempt to eliminate the effect of those two variables. By assuming that the pollution level can be correlated to the population density (which is not necessarily true) and by ignoring the possible effects of the median age, genetics, climate (latitude) and possible virus mutations, the resulting curve might give an indication of the effect of the containment policies and of the people behavior.

Those assumptions might be reasonable when comparing the Scandinavian Countries.

Since the testing policy of different Countries might differ, the cumulative confirmed cases are not necessarily a good statistical representation of the actual number of infections. This is the reason why also the cumulative deceased cases are considered.

NOTE:

This approach assumes uniform distribution of the population in the all territory. Countries like Iceland (or Norway) where average population density is not very high but population is concentrated in few locations show too high curves and therefore are too penalized by this normalization.

Summary of the Results (Confirmed Cases):

Among the Scandinavian Countries, Finland is still one of the most virtuous (even though Denmark has a lower curve). Also Estonia has a lower curve.

Iceland curve is an outlier (with values about 10 times higher compared to the other curves) and has been omitted from the visualization to better show the remaining curves.

By looking into other EU Countries, Finland has the highest curve, followed by Ireland and Spain, while Belgium and Germany show pretty low and close curves.

Another curve that is higher than Finland is US.

Summary of the Results (Deceased Cases):

Denmark has the lowest curve and Sweden has the highest curve in Scandinavia while Finland is the second higest.

In the rest of EU, among the analyzed Countries Spain has the higher curve followed by Ireland.

Poland has the lowest curve. There is a bunch of other Countries with pretty low curves: Germany, Luxemburg, Switzerland, Netherlands, Austria, Portugal.

NOTE:

The curves of certain Countries show cumulative values that decrease on certain days. This is due to incorrect data, as pointed out in the previous sections.

In [114]:
# Comparing cumulative confirmed cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_conf_0_perc/finland_dens, ".", '-', 0, "Finland"),
               (days_tot, denmark_conf_0_perc/denmark_dens, ".", '-', 3, "Denmark"),
               (days_tot, norway_conf_0_perc/norway_dens, ".", '-', 6, "Norway"),
               (days_tot, sweden_conf_0_perc/sweden_dens, ".", '-', 8, "Sweden"),
               (days_tot, estonia_conf_0_perc/estonia_dens, ".", '-', 7, "Estonia"),
               #(days_tot, iceland_conf_0_perc/iceland_dens, ".", '-', 4, "Iceland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other Scandinavian Countries plus Estonia\n"\
                     "in percentage of each Country population "\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [115]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_conf_0_perc/finland_dens, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0_perc/italy_dens, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0_perc/spain_dens, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0_perc/germany_dens, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0_perc/france_dens, ".", '-', 3, "France"),
               (days_tot, switzerland_conf_0_perc/switzerland_dens, ".", '-', 6, 
                "Switzerland"),
               (days_tot, luxembourg_conf_0_perc/luxembourg_dens, ".", '-', 9, 
                "Luxembourg"),
               (days_tot, belgium_conf_0_perc/belgium_dens, ".", '-', 8, "Belgium"),
               (days_tot, ireland_conf_0_perc/ireland_dens, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population "\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [116]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_conf_0_perc/finland_dens, ".", '-', 0, "Finland"),
               (days_tot, netherlands_conf_0_perc/netherlands_dens, ".", '-', 4, 
                "Netherlands"),
               (days_tot, austria_conf_0_perc/austria_dens, ".", '-', 3, "Austria"),
               (days_tot, portugal_conf_0_perc/portugal_dens, ".", '-', 2, "Portugal"),
               (days_tot, poland_conf_0_perc/poland_dens, ".", '-', 7, "Poland"),
               (days_tot, uk_conf_0_perc/uk_dens, ".", '-', 6, "UK"),
               (days_tot, us_conf_0_perc/us_dens, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population "\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [117]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
# and normalized by the density of population
cust_line_plot((days_tot, brazil_conf_0_perc/brazil_dens, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0_perc/russia_dens, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0_perc/india_dens, ".", '-', 1, "India"),
               (days_tot, italy_conf_0_perc/italy_dens, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population"\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [118]:
# Comparing cumulative deceased cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_deceas_0_perc/finland_dens, ".", '-', 0, "Finland"),
               (days_tot, denmark_deceas_0_perc/denmark_dens, ".", '-', 3, "Denmark"),
               (days_tot, norway_deceas_0_perc/norway_dens, ".", '-', 6, "Norway"),
               (days_tot, sweden_deceas_0_perc/sweden_dens, ".", '-', 8, "Sweden"),
               (days_tot, estonia_deceas_0_perc/estonia_dens, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other Scandinavian Countries plus Estonia \n"\
                     "in percentage of each Country population "\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [119]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_deceas_0_perc/finland_dens, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0_perc/italy_dens, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0_perc/spain_dens, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0_perc/germany_dens, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0_perc/france_dens, ".", '-', 3, "France"),
               (days_tot, switzerland_deceas_0_perc/switzerland_dens, ".", '-', 6, 
                "Switzerland"),
               (days_tot, luxembourg_deceas_0_perc/luxembourg_dens, ".", '-', 9, 
                "Luxembourg"),
               (days_tot, belgium_deceas_0_perc/belgium_dens, ".", '-', 8, "Belgium"),
               (days_tot, ireland_deceas_0_perc/ireland_dens, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population "\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [120]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
# and normalized by the density of population
cust_line_plot((days_tot, finland_deceas_0_perc/finland_dens, ".", '-', 0, "Finland"),
               (days_tot, netherlands_deceas_0_perc/netherlands_dens, ".", '-', 4, 
                "Netherlands"),
               (days_tot, austria_deceas_0_perc/austria_dens, ".", '-', 3, "Austria"),
               (days_tot, portugal_deceas_0_perc/portugal_dens, ".", '-', 2, "Portugal"),
               (days_tot, poland_deceas_0_perc/poland_dens, ".", '-', 7, "Poland"),
               (days_tot, uk_deceas_0_perc/uk_dens, ".", '-', 6, "UK"),
               (days_tot, us_deceas_0_perc/us_dens, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population "\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [121]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
# and normalized by the density of population
cust_line_plot((days_tot, brazil_deceas_0_perc/brazil_dens, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0_perc/russia_dens, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0_perc/india_dens, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0_perc/italy_dens, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population"\
                     "and normalized by the density of population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.10. Situation in China

Two sets of plots are shown here: one for the Hubei province where the infection has started and the other one for the rest of China.

The first plot of the two sets shows the cumulative confirmed cases broken down by deceased, recovered and active cases. Whereas in Hubei there has not been yet a second wave, that is the case for the rest of China.

The second plot shows separately the cumulative curves for the confirmed, recovered, deceased and active cases. The curve for the rest of China has been analyzed in details in section 7.1.2.

Note: There is something wrong in the source data for Hubei province on 4/17 since the cumulative recovered cases cannot decrease over time. Also, the incremental data (increment in confirmed cases) for Hubei province from 2/12 has been reported on 2/13.

In [122]:
print("Error data in confirmed cases in Hubei:")
find_error_days(hubei_recov_0)
Error data in confirmed cases in Hubei:
['4/17']
In [123]:
# Plotting daily cumulative cases in Hubei
plot_stacked_bar(days_tot,
                 [hubei_deceas_0, hubei_recov_0, hubei_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in Hubei (China) over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=12, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [124]:
# Plotting daily cumulative cases in Hubei
cust_line_plot((days_tot, hubei_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, hubei_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, hubei_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, hubei_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Hubei (China) over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [125]:
# Plotting daily increments in confirmed cases in Hubei province in China
cust_bar_plot((days_tot, hubei_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in Hubei province (China)",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [126]:
# Plotting daily cumulative cases in the rest of China
plot_stacked_bar(days_tot,
                 [restchina_deceas_0, restchina_recov_0, restchina_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in China either than Hubei over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=12, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [127]:
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, restchina_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in China "\
                     "either than Hubei over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [128]:
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in China "\
                     "either than Hubei",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.11. Situation in Italy

Italy has been the first Country after China (and the first Country in Europe) that has been hit hard from the virus and its government, as opposite to Finland, keeps very comprehensive public data. Therefore, analysis of Italian curves might be useful also to have some hints about Finnish situation.

Italy has had a very confusing strategy and decision-making process in the beginning of the epidemic and this has been probably one of the causes of the high number of cases. However, after an initial period of very poor handling of the situation, quite strict containment measures have been decided and this has led to curves whose shape that are quite close to the curves from China with the main difference that the slowed down phase has been smoother. So, there has been a exponential increase in the number of confirmed cases, followed by a short linear phase and a quite long slowed down phase, which is still ongoing.

Note:

  • Data from 3/12 has been reported on 3/13.
  • The confirmed cases on 6/19 are wrong since the incremental value cannot be negative.
In [129]:
# Plotting daily cumulative cases in Italy
cust_line_plot((days_tot, italy_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, italy_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, italy_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, italy_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Italy over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [130]:
# Plotting daily cumulative cases in Italy
plot_stacked_bar(days_tot,
                 [italy_deceas_0, italy_recov_0, italy_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in Italy over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=12, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [131]:
# Plotting new daily confirmed Coronavirus cases in Italy
cust_bar_plot((days_tot, italy_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [132]:
# Plotting increments in the active cases in Italy
cust_bar_plot((days_tot, calc_increments(italy_act_0), 1, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 increments in the active cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [133]:
# Plotting new daily deceased cases in Italy
cust_bar_plot((days_tot, italy_deceas_incr_0, 3, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily deceased cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.12. World View

By looking the all world, the virus is still in the linear growing phase and there is no sign of slowing down.

In [134]:
# Plotting daily cumulative cases in the all world
cust_line_plot((days_tot, world_conf_tot, ".", '-', 0, "confirmed cases"),
               (days_tot, world_recov_tot, ".", '-', 2, "recovered cases"),
               (days_tot, world_deceas_tot, ".", '-', 3, "deceased cases"),
               (days_tot, world_act_tot, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in the all world "\
                     "over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [135]:
# Plotting new daily cases in the all world
cust_bar_plot((days_tot, world_conf_incr, 0, ""),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily confirmed cases in the all world",
              title_fs=18, title_offset=20,
              rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [136]:
# Plotting increments in the active cases in the all world
cust_bar_plot((days_tot, calc_increments(world_act_tot), 1, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 increments in the active cases "\
                     "in the all world",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=12, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.12.1. Lethality

The estimated average daily number of deaths due to other reasons has been added with the only scope of putting the numbers into context.

In this comparison the deaths by other reasons are estimated with a linear model, which is clearly an approximation since, for example, seasonal flu and suicides follows certain yearly patterns.

On 4/16 the number of reported deaths due to COVID-19 has overtaken the estimated number of deaths due to seasonal flu since the start of the year.

On 5/13 the number of reported deaths due to COVID-19 has overtaken the estimated number of deaths by suicide since the start of the year.

Currently, the number of deaths by COVID-19 grows somehow linearly at about 5000 deaths/day. Therefore, unless this growth will slow down, the number of estimated deaths due to other reasons (like for example road traffic accidents) might at a certain point become higher.

By assuming the number of COVID-19 reported deaths worldwide will stay constant at around 4000/day for the rest of the year, by the end of the year the number of deaths by COVID-19 worldwide would reach about 1.2 millions, whereas the number of deaths by seasonal flu is estimated to be around 470.000 (-/+ 38%). This means an overall COVID-19 mortality rate 2.6 times higher than seasonal flu and slightly lower than traffic road accidents (1.3 millions).

The following shall be noted:

  • The deaths by COVID-19 might be under estimated due to the fact that not all the population is tested
  • The average deaths by seasonal flu in year 2020 might be less than normal due to the high hand hygiene that has been introduced due to the novel Coronavirus. Similarly, the deaths due to road traffic accidents might be slight less than expected due to the reduced mobility of people due to containment measures
  • This comparison tells nothing about the IFR. In particular, it should be noted that, without the containment measures that have been adopted worldwide, the number of COVID-19 deaths would have been very likely considerably higher

Sources for the additional info:
- https://www.worldometers.info/
- https://www.who.int/mental_health/prevention/suicide/suicideprevent/en/
- https://www.who.int/mediacentre/events/meetings/2011/road_safety/en/
- https://www.who.int/news-room/fact-sheets/detail/tobacco

In [137]:
# Plotting new daily deceased cases in the all world
cust_bar_plot((days_tot, world_deceas_incr, 3, 
               "Daily (reported) deceased cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily deceased cases "\
                    "in the all world",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=12, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_y=1288,
              first_line_y_l="Average daily estimated deaths by seasonal flu",
              second_line_y=2192,
              second_line_y_l="Average daily estimated deaths by suicides",
              third_line_y=3561,
              third_line_y_l="Average daily estimated number of deaths "\
                             "by road traffic accidents",
              #fourth_line_y=19178,
              #fourth_line_y_l="Average daily estimated deaths by direct tobacco smoking"
             )
In [138]:
# Creating a series containing the number of deaths by different causes
# so far this year
deceas_causes = pd.Series([world_deceas_tot.iloc[-1],
                           1288*(len(days_tot)+21),
                           2192*(len(days_tot)+21),
                           3561*(len(days_tot)+21),
                           19178*(len(days_tot)+21)],
                          index=["Reported deaths by COVID-19",
                                 "Estimated deaths by seasonal flu",
                                 "Estimated deaths by suicides",
                                 "Estimated deaths by road traffic accidents",
                                 "Estimated deaths by direct tobacco smoking"])
In [139]:
# Showing the number of deaths by different causes so far this year in a bar plot
plot_cust_hbar(deceas_causes.sort_values(),
               figsize_w=8, figsize_h=6,
               frame=False, grid=False,
               ref_font_size=12,
               title_text="Number of deaths by different causes so far this year "\
                          "compared to COVID-19",
               title_offset=20,
               color_numb=3,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=False,
               visible_digits=2)
In [140]:
# Estimating worldwide COVID-19 deaths by the end of the year
# (hypothesis: constant growth)
# (estimation date: May 27th)
est_flu_deaths_2020 = 1288*365
est_road_acc_deaths_2020 = 3561*365
est_COVID19_deaths_2020 = 350453+4000*218
print("Estimated deaths by COVID-19 by the end of the year: {}\n"\
      "(Hypothesis: constant growth).".\
      format(est_COVID19_deaths_2020))
print("Estimated deaths by seasonal flu by the end of the year: {}.".\
      format(est_flu_deaths_2020))
print("Estimated deaths by traffic road accidents by the end of the year: {}.".\
      format(est_road_acc_deaths_2020))
# Comparing the result with other causes of deaths
COVID_flu_ration = round(est_COVID19_deaths_2020/est_flu_deaths_2020, 2)
COVID_road_ration = round(est_COVID19_deaths_2020/est_road_acc_deaths_2020, 2)
print("\nBy assuming a constant increase in the number of COVID-19 deaths,\n"\
      "by the end of the year the number of COVID-19 deaths will be {} times \n"\
      "the number of estimated deaths by seasonal flu and {} times the number of \n"\
      "estimated deaths by traffic road accidents.".\
      format(COVID_flu_ration, COVID_road_ration))
Estimated deaths by COVID-19 by the end of the year: 1222453
(Hypothesis: constant growth).
Estimated deaths by seasonal flu by the end of the year: 470120.
Estimated deaths by traffic road accidents by the end of the year: 1299765.

By assuming a constant increase in the number of COVID-19 deaths,
by the end of the year the number of COVID-19 deaths will be 2.6 times 
the number of estimated deaths by seasonal flu and 0.94 times the number of 
estimated deaths by traffic road accidents.

8. Statistics

8.1. World View

In [141]:
# Reordering the columns
daily_rep_group = daily_rep_group.reindex(columns=['Confirmed',
                                                   'Recovered',
                                                   'Deaths',
                                                   'Active'])
In [142]:
print("Grand Total Worldwide:\n")
print(daily_rep_group.sum().to_string())
# Confirmed cases in percentage of the total population
cont_perc_world = daily_rep_group.sum()[0]/(7.8*1000000000)*100
print("\nConfirmed cases in percentage of the total population:")
print("{:.2f}".format(cont_perc_world))
Grand Total Worldwide:

Confirmed    9979535
Recovered    5051864
Deaths        498710
Active       4350809

Confirmed cases in percentage of the total population:
0.13
In [143]:
# Mortality (worldwide)
mort = (daily_rep_group.sum()[2]/daily_rep_group.sum()[0])*100
print("'Calculated' mortality worldwide: {:.2f}\n".format(mort))
print("IMPORTANT NOTE:\nThe actual mortality could be much lower",
      "due to the fact that not all infected people\nhave been tested!\n"
      "On the other hand, the counted deaths are due to infections that happened",
      "weeks ago.\nThis means that, as long as the contagius cases increase, "
      "the calculated mortality\nis under-estimated.")
'Calculated' mortality worldwide: 5.00

IMPORTANT NOTE:
The actual mortality could be much lower due to the fact that not all infected people
have been tested!
On the other hand, the counted deaths are due to infections that happened weeks ago.
This means that, as long as the contagius cases increase, the calculated mortality
is under-estimated.

8.2. Top Ten Countries

In [144]:
# The top 10 Countries by number of confirmed cases in descending order
conf_top_10 = daily_rep_group.sort_values(by ='Confirmed', ascending = False).\
              head(10)['Confirmed']
In [145]:
# Showing the top 10 Countries by number of confirmed cases in a bar plot
plot_cust_hbar(conf_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of confirmed cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=0,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [146]:
# The top 10 Countries by number of recovered cases in descending order
recov_top_10 = daily_rep_group.sort_values(by ='Recovered', ascending = False).\
               head(10)['Recovered']
In [147]:
# Showing the top 10 Countries by number of recovered cases in a bar plot
plot_cust_hbar(recov_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of recovered cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=2,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [148]:
# The top 10 Countries by number of deceased cases in descending order
deceas_top_10 = daily_rep_group.sort_values(by ='Deaths', ascending = False).\
                head(10)['Deaths']
In [149]:
# Showing the top 10 Countries by number of deceased cases in a bar plot
plot_cust_hbar(deceas_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of deceased cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=3,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [150]:
# Fixing the current number of active cases in US
# (due to mistake in source data)
daily_rep_group.at['US', 'Active'] = \
daily_rep_group.loc['US', ['Confirmed']][0] - \
daily_rep_group.loc['US', ['Recovered']][0] - \
daily_rep_group.loc['US', ['Deaths']][0]
In [151]:
# The top 10 Countries by number of active cases in descending order
act_top_10 = daily_rep_group.sort_values(by ='Active', ascending = False).\
             head(10)['Active']
In [152]:
# Showing the top 10 Countries by number of active cases in a bar plot
plot_cust_hbar(act_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of active cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=1,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=False,
               visible_digits=2)
In [153]:
print("\n(*) Note that for certain Countries the figures in the previous three tables",
      "contain also off shore territories.")
print("For example, for France the numbers include:\n\n",
      "- French Polynesia\n",
      "- New caledonia\n",
      "- St Martina\n",
      "- Saint Barthelemyia\n",
      "- French Guiana\n",
      "- Guadelupe\n",
      "- Mayotte\n",
      "- Reunion\n")
(*) Note that for certain Countries the figures in the previous three tables contain also off shore territories.
For example, for France the numbers include:

 - French Polynesia
 - New caledonia
 - St Martina
 - Saint Barthelemyia
 - French Guiana
 - Guadelupe
 - Mayotte
 - Reunion

8.3. Finland

In [154]:
# Visualizing the current status in Finland
print("Latest situation in Finland:\n")
print(daily_rep_group.loc['Finland'].to_string())
# Confirmed cases in percentage of the total population
cont_perc_fin = daily_rep_group.loc['Finland'][0]/(5.513*1000000)*100
print("\nConfirmed cases in percentage of the total population:")
print("{:.2f}".format(cont_perc_fin))
Latest situation in Finland:

Confirmed    7198
Recovered    6600
Deaths        328
Active        270

Confirmed cases in percentage of the total population:
0.13

9. Conclusions

Currently, the number of recorded COVID-19 cases is about 0.13% of the world population and has produced already more deaths than seasonal flu worldwide.

Even though the virus originated from China, it has spread west to Europe and then further west to US and South America.

The first wave has been over in China around the end of winter and in most of Europe around the end of spring.

Currently, most of the active cases are in US, followed by Brazil, Russia, India and UK. Also, in China a second wave has followed and at the beginning of the summer there are signs of a possible third wave.

Unfortunately, the Finnish Institute for Health and Welfare (THL) does not seem to keep a complete public API for uploading daily the time series. In particular, there is no reliable estimate of the number of recovered cases and therefore it is not possible to get a reliable curve for the active cases, which is actually the most important curve to follow the evolution of the epidemic.

The confirmed reported cases are about 0.13% of the Finnish population. The Finnish curve of the confirmed cases is currently in a slowing down growing phase. The confirmed cases curve is the lowest pro capita in Scandinavia and is one of the lowest in Europe. The same applies to the cumulative deceased cases, suggesting that the low curve in the cumulative confirmed cases might not be due to a too relaxed testing policy.

Even though the actual percentage of people that have been in contact with the virus is certainly higher, it should be noted that such low numbers suggest that the immunity in Finland is at very low levels currently (there is still a quite high percentage of susceptible people).

There might be different reasons the relative difficulty of the virus to spread in Finland, including remote geographical location, low population density, low level of pollution, culture and local practices (as keeping physical distances when greeting, spending a lot of time outdoor in Nature and vising sauna frequently) and prompt containment actions.

In a further study it would be interesting to verify those assumptions scientifically.

10. Acknowledgements

Many thanks to Johns Hokpins University for sharing and maintaining daily the source csv files.

Many thanks to Coursera for providing a very informative course.

Many thanks to colleagues and friends who have contributed by providing links and comments.


In [155]:
print("Last plotted day:", dt.datetime.strptime(last_day, "%m-%d-%Y").\
      date().strftime("%d-%b-%Y"))
end_time = dt.datetime.utcnow()
script_duration = end_time - start_time
print("\nRunning time for the full script (hh:mm:ss):", script_duration)
Last plotted day: 27-Jun-2020

Running time for the full script (hh:mm:ss): 0:02:25.471477

Used software:
- Jupyter Notebook server 6.0.1
- Python 3.6.8
- numpy 1.18.2
- pandas 1.0.3
- matplotlib 3.1.2
- seaborn 0.9.0
- regex 2019.8.19
on top of Linux Ubuntu 18.04